How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL
Training Data
19 HOURS AGO
How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL
How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Training Data
19 HOURS AGO
This podcast explores the development of Composer, a specialized AI model for software engineering, created by Cursor in collaboration with Fireworks. The discussion reveals how focusing a model's entire capacity on a single task can lead to superior performance and efficiency, challenging the notion that larger, general-purpose models are always better. The conversation details the unconventional, top-down approach taken to build this model, prioritizing rapid deployment and real-world user feedback.
The conversation centers on the creation of Composer, a specialized foundation model for software engineering. Instead of pre-training from scratch, Cursor and Fireworks used a top-down approach, starting with a strong open-source base and applying mid-training and reinforcement learning (RL) to specialize it for coding. This strategy allowed them to get a useful model into users' hands quickly. The RL training process was particularly innovative, using an asynchronous pipeline across four global clusters to maximize GPU utilization. They leveraged off-peak production inference GPUs and developed a lossless compression algorithm to efficiently ship model updates between clusters. The training incorporated both offline simulated rollouts and online real-time RL with actual user data, using self-summarization for long-horizon tasks and verifiable rewards for evaluation. The key takeaway is that the most powerful RL environment is a company's own production system, properly isolated.
00:00
00:00
Specialized models allocate all weights to a single task.
05:30
05:30
The approach prioritizes rapid user value over full pre-training.
13:58
13:58
Optimized engines can achieve a 1:3 ratio.
19:20
19:20
Lossless compression of model deltas for fast cross-cluster shipping
42:35
42:35
The most powerful RL environment is your own production system.