How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Training Data

May 26

Overview Shownote Highlights Transcript Chapters Pins

This podcast explores the development of Composer, a specialized AI model for software engineering, created by Cursor in collaboration with Fireworks. The discussion reveals how focusing a model's entire capacity on a single task can lead to superior performance and efficiency, challenging the notion that larger, general-purpose models are always better. The conversation details the unconventional, top-down approach taken to build this model, prioritizing rapid deployment and real-world user feedback.

The conversation centers on the creation of Composer, a specialized foundation model for software engineering. Instead of pre-training from scratch, Cursor and Fireworks used a top-down approach, starting with a strong open-source base and applying mid-training and reinforcement learning (RL) to specialize it for coding. This strategy allowed them to get a useful model into users' hands quickly. The RL training process was particularly innovative, using an asynchronous pipeline across four global clusters to maximize GPU utilization. They leveraged off-peak production inference GPUs and developed a lossless compression algorithm to efficiently ship model updates between clusters. The training incorporated both offline simulated rollouts and online real-time RL with actual user data, using self-summarization for long-horizon tasks and verifiable rewards for evaluation. The key takeaway is that the most powerful RL environment is a company's own production system, properly isolated.

00:00

Specialized models allocate all weights to a single task.

05:30

The approach prioritizes rapid user value over full pre-training.

13:58

Optimized engines can achieve a 1:3 ratio.

19:20

Lossless compression of model deltas for fast cross-cluster shipping

42:35

The most powerful RL environment is your own production system.