microsoft/FastContext-1.0-4B-SFT

Hugging Face Models Trending Models

Summary

Microsoft released FastContext-1.0, a lightweight repository-exploration subagent for LLM coding agents that reduces main-agent token consumption by up to 60% while improving resolution rates by up to 5.5%.

Task: text-generation Tags: transformers, safetensors, qwen3, text-generation, Explorer SubAgent, Repository Exploration, conversational, en, license:mit, text-generation-inference, endpoints_compatible, region:us
Original Article
View Cached Full Text

Cached at: 06/15/26, 08:59 PM

microsoft/FastContext-1.0-4B-SFT · Hugging Face

Source: https://huggingface.co/microsoft/FastContext-1.0-4B-SFT

https://huggingface.co/microsoft/FastContext-1.0-4B-SFT#1-model-introduction1. Model Introduction

FastContext-1.0is a lightweightrepository-exploration subagentfor LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issuesparallel read-only tool calls(READ, GLOB, GREP), and returnscompact file paths and line rangesas focused context.

Repository exploration is a major bottleneck in modern coding agents — locating relevant code consumes a large share of the token budget and pollutes the solver’s context with irrelevant snippets. In our analysis of GPT-5.4 trajectories, reading and searching account for56.2% of all tool-use turnsand46.5% of the main agent’s total tokens. FastContext moves this work into a dedicated subagent so the main agent receives clean, grounded evidence rather than the long trail of exploratory reads and searches.

The model family spans4B–30B parameters, bootstrapped from strong reference-model trajectories via supervised fine-tuning (SFT) and refined with task-grounded reinforcement learning (RL) for broad first-turn search, multi-turn evidence gathering, and precise citation generation.

  • **Backbones:**Qwen3-4B-Instruct (4B explorer) and Qwen3-Coder-30B-A3B (30B explorer)
  • Variants:FC\-4B\-SFT,FC\-4B\-RL(deployment targets),FC\-30B\-SFT(scaling reference)
  • **Context length:**up to 262K tokens
  • **Paper:**FastContext: Training Efficient Repository Explorer for Coding Agents
  • Code & data:https://github.com/microsoft/fastcontext

https://huggingface.co/microsoft/FastContext-1.0-4B-SFT#how-it-worksHow it works

Coding Agent ──query──▶  FastContext  ──read/search──▶  Repository
     ▲                       │
     └──── file-line ────────┘
          citations

Internally, FastContext runs an exploration loop:

  1. Query understanding— translate the issue into search intents.
  2. Parallel tool calling— issue multipleREAD/GLOB/GREPcalls in a single turn to cover complementary hypotheses.
  3. Observation-driven refinement— use tool outputs to guide the next search turn.
  4. Final citations— return a compact<final\_answer\>block of file paths and line ranges.

https://huggingface.co/microsoft/FastContext-1.0-4B-SFT#2-evaluation-results2. Evaluation Results

https://huggingface.co/microsoft/FastContext-1.0-4B-SFT#end-to-end-performance-mini-swe-agentEnd-to-end performance (Mini-SWE-Agent)

Integrating FastContext into Mini-SWE-Agent improves end-to-end resolution rates byup to 5.5%while reducing main-agent token consumption byup to 60%, with only marginal overhead. Scores, tokens, and turns are measured on the main-agent trajectory; deltas are relative tow/o Explorefor the same main agent.

Main AgentSubagentSWE-bench MultilingualSWE-bench ProSWE-QAGPT-5.4w/o Explore71.7 / 457k46.0 / 818k81.3 / 418kFC-30B-SFT75.0(↑3.3) / 356k (↓22.1%)49.0 (↑3.0) / 688k (↓15.9%)82.0(↑0.7) / 206k (↓50.7%)FC-4B-SFT73.3 (↑1.6) / 364k (↓20.4%)47.0 (↑1.0) / 689k (↓15.8%)81.9 (↑0.6) / 213k (↓49.0%)FC-4B-RL74.7 (↑3.0) / 338k (↓26.0%)48.5 (↑2.5) / 701k (↓14.3%)82.0(↑0.7) / 210k (↓49.8%)GLM-5.1w/o Explore72.3 / 2514k17.5 / 2692k72.7 / 401kFC-30B-SFT73.7 (↑1.4) / 1797k (↓28.5%)20.0 (↑2.5) / 2370k (↓12.0%)73.3 (↑0.6) / 292k (↓27.2%)FC-4B-SFT73.3 (↑1.0) / 1919k (↓23.7%)18.0 (↑0.5) / 2279k (↓15.3%)73.4 (↑0.7) / 306k (↓23.7%)FC-4B-RL73.7 (↑1.4) / 1971k (↓21.6%)22.5(↑5.0) / 2210k (↓17.9%)73.5 (↑0.8) / 302k (↓24.7%)Kimi-K2.6w/o Explore76.3 / 1553k31.0 / 2383k71.6 / 510kFC-30B-SFT76.7 (↑0.4) / 1360k (↓12.4%)33.0 (↑2.0) / 2150k (↓9.8%)72.8 (↑1.2) / 373k (↓26.9%)FC-4B-SFT75.3 (↓1.0) / 1306k (↓15.9%)32.5 (↑1.5) / 2159k (↓9.4%)72.6 (↑1.0) / 402k (↓21.2%)FC-4B-RL78.3(↑2.0) / 1384k (↓10.9%)33.5(↑2.5) / 2158k (↓9.4%)72.6 (↑1.0) / 378k (↓25.9%) Score / Tokens shown per cell. Best result per main-agent block in bold.

Highlights:

  • FastContext improves end-to-end accuracy forevery main agent and benchmark; the largest gains appear on SWE-bench Pro (e.g. GPT-5.4 +5.5, GLM-5.1 +5.0).
  • The biggest token savings reach60.3%(GPT-5.4 on SWE-QA).
  • The compact4B-RLexplorer can outperform the larger30B-SFTexplorer — e.g. on GLM-5.1 SWE-bench Pro it reaches 22.5 vs. 20.0 while using fewer tokens.

https://huggingface.co/microsoft/FastContext-1.0-4B-SFT#3-quick-start3. Quick Start

Launch the model with an OpenAI-compatible server (e.g. SGLang). The example below serves the 4B explorer:

python3 -m sglang.launch_server \
    --model-path FastContext-1.0-4B-SFT \
    --tool-call-parser qwen \
    --context-length 262144 \
    --trust-remote-code \
    --dtype bfloat16 \
    --host 0.0.0.0 \
    --port 30000 \
    --tp-size 1 \
    --mem-fraction-static 0.8

FastContext exposes only three read-only tools to the model:

ToolPurposeREADReturn line-numbered file contentsGLOBPath discovery by glob patternGREPRegex search over repository text (ripgrep-style) At each turn the explorer either issues one or more (parallel) tool calls or stops with a final<final\_answer\>evidence list. Wire FastContext into a coding agent (e.g. Mini-SWE-Agent) as an exploration subagent the main agent can invoke on demand.

https://huggingface.co/microsoft/FastContext-1.0-4B-SFT#4-training-recipe4. Training Recipe

FastContext is trained in two stages:

  • **Supervised fine-tuning (SFT):**The exploration traces, split into three sources matching the runtime behavior of the subagent —parallel\_toolcalls(broad first-turn search),multiturn\_traj(multi-turn evidence gathering), andlinerange(precise citation generation).
  • Reinforcement learning (RL):The model is rolled out as the actual subagent and optimized withGRPOusing a deterministic reward combining file- and line-level F1, a bonus for bounded parallel exploration, and format penalties.

https://huggingface.co/microsoft/FastContext-1.0-4B-SFT#licenseLicense

This project is licensed under the MIT License.

Similar Articles