exploration-exploitation

#exploration-exploitation

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

arXiv cs.LG ↗ · 2026-06-16 Cached

Introduces LatentGym, a controllable testbed for studying cross-task experiential learning in LLM agents, enabling measurement of exploration vs exploitation and revealing how frontier models fail to adapt across related tasks.

0 favorites 0 likes

#exploration-exploitation

Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper proposes Adaptive Entropy Regularization (AER), a framework that dynamically balances exploration and exploitation in LLM reinforcement learning by addressing policy entropy collapse through difficulty-aware coefficient allocation and initial-anchored target entropy. Experiments on mathematical reasoning benchmarks demonstrate consistent improvements in both accuracy and exploration capability.

0 favorites 0 likes

#exploration-exploitation

DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off

Hugging Face Daily Papers ↗ · 2026-04-15 Cached

DiPO introduces a novel reinforcement learning approach for LLMs that uses perplexity-based sample partitioning to disentangle exploration and exploitation subspaces, combined with a bidirectional reward allocation mechanism for more stable policy optimization. The method demonstrates superior performance on mathematical reasoning and function calling tasks.

0 favorites 0 likes

exploration-exploitation

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off

Submit Feedback