trust-region

Tag

Cards List
#trust-region

Trust Region On-Policy Distillation

Hugging Face Daily Papers · 3d ago Cached

The paper proposes Trust Region On-Policy Distillation (TrOPD) to stabilize on-policy distillation of large language models by using trust regions, outlier estimation, and off-policy guidance, outperforming existing methods on reasoning and code generation benchmarks.

0 favorites 0 likes
#trust-region

Trust-Region Behavior Blending for On-Policy Distillation

Hugging Face Daily Papers · 5d ago Cached

Trust-Region behavior Blending (TRB) improves on-policy distillation by replacing poor early student rollouts with teacher-like behavior within a KL trust region during warmup, achieving stronger results on math-reasoning tasks.

0 favorites 0 likes
#trust-region

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

arXiv cs.LG · 2026-05-18 Cached

This paper identifies a structural failure mode in sequential fine-tuning of shared-context multi-agent LLM teams, formalized as compounding occupancy shift, and proposes TeamTR, a trust-region framework that resamples trajectories and enforces per-agent divergence control, achieving 7.1% average improvement over baselines.

0 favorites 0 likes
#trust-region

Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates

arXiv cs.LG · 2026-05-13 Cached

This paper introduces Trust Region Inverse Reinforcement Learning (TRIRL), a method that combines monotonic dual improvement with efficient local policy updates to outperform state-of-the-art imitation learning methods. It addresses the trade-off between stability and computational cost in IRL by using trust-region constraints.

0 favorites 0 likes
← Back to home

Submit Feedback