Tag
This paper introduces PaW, a co-training framework that adds auxiliary world modeling supervision to policy learning during on-policy RL rollouts, improving language agent training without additional computational overhead.
CoHyDE introduces an iterative co-training procedure for an LLM rewriter and a dense encoder to improve tool retrieval from large API catalogs. It outperforms single-component baselines, especially on vague queries, by training both components together using InfoNCE and DPO.
Researchers from Fordham University introduce Reciprocal Co-Training (RCT), a framework that couples LLMs and Random Forest classifiers via reinforcement learning, creating an iterative feedback loop where each model improves using signals from the other. Experiments on three medical datasets show consistent performance gains for both models, demonstrating a general mechanism for integrating incompatible model families.