LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents
Summary
LiteCoder-Terminal-Gen introduces a zero-dependency synthetic pipeline that generates executable terminal training environments, producing SFT and RL datasets that enable language agents to achieve significant performance gains on Terminal Bench benchmarks.
View Cached Full Text
Cached at: 05/29/26, 02:59 AM
Paper page - LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents
Source: https://huggingface.co/papers/2605.29559
Abstract
LiteCoder-Terminal-Gen enables scalable training of language agents for terminal environments through synthetic, executable environments that outperform traditional methods.
Masteringterminal environmentsrequireslanguage agentscapable ofmulti-step planning,feedback-grounded execution, anddynamic state adaptation. However, training such agents is currently bottlenecked by a reliance on scraped external repositories, which limits domain diversity, environment controllability, and the targeting of specific capability deficits. We introduce LiteCoder-Terminal-Gen, azero-dependency synthesis pipelinethat autonomously generates executable and verifiable terminal training environments directly from domain specifications. Using this framework, we construct two large-scale resources: LiteCoder-Terminal-SFT, comprising 11,255expert trajectoriesacross 10 domains, and LiteCoder-Terminal-RL, featuring 602 verifiable environments fortrajectory-level preference optimization.Supervised fine-tuningof Qwen-family models on our SFT dataset yields agents that significantly outperform their base counterparts. Notably, our 32B variant achieves 29.06%, 18.54%, and 34.00% pass@1 on Terminal Bench 1.0, 2.0, and Pro, respectively. Furthermore, applyingDirect Multi-turn Preference Optimization(DMPO) on our RL environments yields additional performance gains. These results systematically demonstrate that fully synthetic, executable environments offer a scalable and verifiable supervision signal for mastering complex, real-worldcommand-line workflows.
View arXiv pageView PDFGitHubAdd to collection
Get this paper in your agent:
hf papers read 2605\.29559
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper2
#### Lite-Coder/LiteCoder-Terminal-4b-sft 4B• Updatedabout 1 hour ago • 125
#### Lite-Coder/LiteCoder-Terminal-30b-a3b-sft 31B• Updatedabout 1 hour ago • 40
Datasets citing this paper2
#### Lite-Coder/LiteCoder-Terminal-RL-preview Updatedabout 1 hour ago • 1.23k • 3 #### Lite-Coder/LiteCoder-Terminal-SFT Updatedabout 1 hour ago • 363 • 1
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.29559 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills
Terminal-World introduces a fully automated pipeline that uses agent skills to synthesize high-quality training data for terminal agents, enabling models to outperform baselines with only 1.2% of the training data. The method co-derives task instructions, environments, and teacher trajectories from skill primitives.
What Makes Interaction Trajectories Effective for Training Terminal Agents?
This paper investigates what makes interaction trajectories effective for training terminal-based AI agents, introducing the Terminal-Lego pipeline and revealing a pedagogical paradox where weaker agents can produce better training data. It finds that environment-grounded supervision, rather than teacher performance, is key for student generalization.
Turning local agents into self-optimizing agents
A self-optimizing agentic pipeline that improves benchmark performance from ~30% to ~90% on TerminalBench, and can be extended to everyday chats by logging interactions, reflecting with a local model, and injecting lessons into future system prompts.
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
TACO introduces a self-evolving compression framework that automatically learns to shrink redundant terminal interaction history, cutting token overhead ~10% while boosting accuracy 1-4% across TerminalBench and other code-agent benchmarks.
EndPrompt: Efficient Long-Context Extension via Terminal Anchoring
EndPrompt proposes a method for extending the context window of large language models using only short training sequences, by anchoring a terminal prompt with target-length positional indices. It achieves strong benchmark results with substantially less computation than full-length fine-tuning.