@ickma2311: CMU Advanced NLP: Reinforcement Learning I had been curious about how RL works on top of LLMs, and this CMU lecture mad…

X AI KOLs Timeline Events

Summary

CMU Advanced NLP lecture clarifies how reinforcement learning optimizes whole-output rewards (correctness, helpfulness, safety) rather than next-token prediction used in pretraining/fine-tuning.

CMU Advanced NLP: Reinforcement Learning I had been curious about how RL works on top of LLMs, and this CMU lecture made it much clearer for me: Pretraining/fine tuning focus on the next token; RL focuses on the reward of the whole output: correctness, helpfulness, safety,
Original Article
View Cached Full Text

Cached at: 04/22/26, 08:22 AM

CMU Advanced NLP: Reinforcement Learning I had been curious about how RL works on top of LLMs, and this CMU lecture made it much clearer for me: Pretraining/fine tuning focus on the next token; RL focuses on the reward of the whole output: correctness, helpfulness, safety,

Similar Articles

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

arXiv cs.CL

This paper investigates whether reinforcement learning can improve the direct recall of parametric knowledge in LLMs beyond reasoning tasks. It demonstrates that RL with binary rewards yields significant gains in factual QA benchmarks by redistributing probability mass to unlock latent knowledge rather than acquiring new facts.

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

arXiv cs.LG

This paper frames LLM-generated reward shaping for sparse structured RL as a debugging problem, identifying failure modes like reward flooding and semantic misunderstanding. The authors propose diagnostic-driven iterative refinement, achieving dramatic success rate improvements (e.g., DoorKey-8×8 from 2.3% to 97.6%) compared to one-shot generation.