ppo-correction

#ppo-correction

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

This paper addresses the missing old logits problem in asynchronous reinforcement learning for LLMs, proposing exact and approximate correction methods to improve training stability and performance.

0 favorites 0 likes

ppo-correction

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

Submit Feedback