Tag
This paper investigates what makes interaction trajectories effective for training terminal-based AI agents, introducing the Terminal-Lego pipeline and revealing a pedagogical paradox where weaker agents can produce better training data. It finds that environment-grounded supervision, rather than teacher performance, is key for student generalization.
LiteCoder-Terminal-Gen introduces a zero-dependency synthetic pipeline that generates executable terminal training environments, producing SFT and RL datasets that enable language agents to achieve significant performance gains on Terminal Bench benchmarks.