Tag
A new method called ECHO bridges RL and pre-training by using next token prediction on tool call outputs to learn from the environment beyond reward signals, combining world modeling and agentic actions.
This paper introduces PaW, a co-training framework that adds auxiliary world modeling supervision to policy learning during on-policy RL rollouts, improving language agent training without additional computational overhead.