llm-mid-training

Tag

Cards List
#llm-mid-training

ExpRL: Exploratory RL for LLM Mid-Training

Hugging Face Daily Papers · 6d ago Cached

ExpRL is a new RL-based mid-training method that uses human-written reference solutions as dense reward scaffolds (never shown to the policy) to improve LLM reasoning, achieving significant gains on hard math benchmarks like AIME-2026.

0 favorites 0 likes
← Back to home

Submit Feedback