The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

Highlights

Transcript

Chapters

Pins

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

May 01

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

May 01

Overview Shownote

Highlights

Transcript

Chapters

Pins

Unprocessed episode, you can be the first!

Shownote

Kyle Corbitt, founder of OpenPipe, breaks down reinforcement learning and custom fine-tuning for modern AI models. He explains how RL differs from supervised fine-tuning, why GRPO and LLM-as-judge post-training matter, and how these techniques can improve ...