@LightOnIO: Reason-ModernColBERT topped BrowseComp-Plus with just 149M parameters. Now, Agent-ModernColBERT adds ~10% on top. Reach…
Summary
LightOn released Agent-ModernColBERT, a 149M parameter open-source retrieval model that achieves performance comparable to GPT-5 combined with Qwen3-Embed-8B by integrating agent reasoning traces into queries.
View Cached Full Text
Cached at: 05/13/26, 10:20 AM
Reason-ModernColBERT topped BrowseComp-Plus with just 149M parameters. Now, Agent-ModernColBERT adds ~10% on top. Reaches GPT-5 + Qwen3-8B with GPT-OSS-120B. Still 149M parameters. Fully Open. Smaller. Cheaper. Kudos to @antoine_chaffin for the work Full benchmarks, methodology, model, data, and training code in the blog ↓ https://lighton.ai/lighton-blogs/deep-research-is-open-now?utm_source=x&utm_medium=organic_social&utm_campaign=blog_deep_research_is_open_now&utm_content=post_20260512…
Deep Research is now Open - LightOn
Source: https://lighton.ai/lighton-blogs/deep-research-is-open-now?utm_source=x&utm_medium=organic_social&utm_campaign=blog_deep_research_is_open_now&utm_content=post_20260512 Agentic retrieval is changing the shape of the query. In addition to the query, the agent is creating reasoning traces about what it is searching and why. Usual retrieval setup discard those clues, while AgentIR append them to the query. By applying AgentIR to late interaction models, we get a 10% increase over the already state-of-the-art Reason-ModernColBERT. Combined to GTP-OSS-120B, we can now reach the original performance of GPT-5 combined to Qwen3-Embed-8B, with a model 54× smaller and a fully open source agent.
.png)
Most notably, Agent-ModernColBERT is competitive with AgentIR-4B, a dense model 26× larger trained on the same data, highlighting once again the edge of late interaction in agentic search.
The pattern keeps holding
Two months ago, Reason-ModernColBERT reached87.59% accuracyon BrowseComp-Plus with GPT-5, a7.59-point jumpover the previous best while leading on recall and calibration error. A 149M late-interaction retriever was outperforming dense models up to54× larger. Full write-up:The Bloated Retriever Era Is Over.
BrowseComp-Plus is designed to test retrieval inside Deep Research-style workflows: hard questions, a fixed corpus, and a setup where the retriever has to surface evidence useful enough for an agent to answer. It does not only ask whether the retriever can rank documents well. It asks whether better retrieval actually helps the agent solve the task.
Reason-ModernColBERT was not trained for agentic search specifically. It was trained for reasoning-intensive retrieval in general. The obvious next question was simple:
What happens if you train a late-interaction model directly for retrieval inside reasoning loops?
AgentIR and DR-Synth
Zijian Chen, who created BrowseComp-Plus, also released AgentIR.
The core idea is straightforward: in agentic search, the query is rarely just a query. Before searching, the agent has usually decomposed the task, formed hypotheses, ruled out dead ends, and decided what evidence it needs next.
Most retrieval pipelines throw that context away. They search only on the rewritten query.
AgentIR does something different: it concatenates the agent’s reasoning trace directly into the retrieval query.
To support this setup, the team releasedDR-Synth, a dataset of synthetic agent trajectories, and trainedAgentIR-4B, a dense 4B retriever. AgentIR-4B outperformed the larger ReasonIR-8B and pipelines using rerankers.
This matters because Reason-ModernColBERT was itself built from the ReasonIR line of work. The recipe was already familiar: when a strong dense retriever and its training data are released, train a late-interaction model on the same signal with PyLate, then test whether multi-vector retrieval can do more with less.
Agent-ModernColBERT
Agent-ModernColBERT built on this now familiar recipe by fine tuning a late interaction model on the DR-Synth data.
It is a149M-parameter late-interaction retrievertrained to handle retrieval queries that include agent reasoning traces. Training takes aboutfive minutes.
The result is a small retriever that is directly adapted to the AgentIR setting, while preserving the core advantage of late interaction: token-level matching instead of compressing the entire query and document into a single vector.The open-source result is the important one: with GPT-OSS-120B and a simple get_document function, Agent-ModernColBERT reaches72.53accuracy. That is above the original GPT-5 + Qwen3-Embed-8B configuration from BrowseComp-Plus, although the two scaffolds are not identical because the original baseline did not expose get_document.
That caveat matters. So does the result.
A 149M retriever, paired with an open LLM and a lightweight document-reading tool, crosses the original closed-model benchmark configuration. Fully open. Smaller. Cheaper. More accurate in this setup.
The AgentIR-4B comparison is the other key result. Agent-ModernColBERT is trained on the same data and uses the same prompt as AgentIR, down to the same incorrectly escaped line break, for comparability, yet remains competitive while being26× smaller.
At that point, the dense-scaling argument starts to run out of room.

Why reasoning traces favor late interaction
Agentic retrieval changes the shape of the query.
A standard search query is often short: a few keywords, an entity, a question. An agentic query can be much richer. It may contain the current hypothesis, intermediate reasoning, constraints, and a description of the missing evidence.
That is a lot of information to compress into one vector.
Late interaction models do not need to make that compression as aggressively. They keep token-level representations and compare query tokens against document tokens at retrieval time. When the query contains a reasoning trace, this becomes especially useful: the retriever can match different parts of the trace to different pieces of evidence in the document.
That is why Reason-ModernColBERT was already strong on BrowseComp-Plus without being trained for agentic search. Agent-ModernColBERT shows what happens when the training data is aligned with the agentic setting directly.
Why this changes the cost equation
Every retrieval call inside a Deep Research loop costs tokens, latency, and money.
A better retriever reduces the number of search iterations needed to reach an answer. A smaller retriever reduces the cost of each one. Late interaction helps on both sides.
For teams building Deep Research-style agents over enterprise knowledge bases, this is not just a benchmark detail. Retrieval runs inside the loop. It has to be fast, cheap, and accurate enough for the agent to decide what to read next.
An 8B dense embedder is expensive to serve at scale and difficult to keep responsive inside long reasoning loops. Agent-ModernColBERT changes the equation on both sides: smaller retrieval steps, and fewer of them.
In the GPT-OSS-120B setup, Agent-ModernColBERT reaches higher accuracy than AgentIR-4B while using fewer search calls. That matters because every search inside a Deep Research loop adds latency, token usage, and cost. A 149M late-interaction retriever does not just make retrieval cheaper to serve. It also helps the agent get to the right evidence faster.
As retrieval becomes more agentic, the bottleneck does not look like model scale.
It looks like dense compression itself.
Open and reproducible
Models
- Agent-ModernColBERT: agentic retrieval, 149M parameters
- Reason-ModernColBERT: reasoning-intensive retrieval, 149M parameters
- LateOn-Code: code retrieval, 17M / 149M parameters
Tools & infrastructure
- PyLate: train late-interaction models in a few lines of code - The exact training script used for Agent-ModernColBERT is available[here]
- ColGrep: semantic code search for terminals and coding agents
- NextPlaid: local-first multi-vector database
Datasets & benchmarks
Pull the weights fromHugging Face. Run late-interaction retrieval against your own corpus through /search inLightOn Console. Building Deep Research-style agents over enterprise knowledge bases?Talk to us.
Agent-ModernColBERT was developed by Antoine Chaffin, Research Engineer at LightOn.
Similar Articles
@antoine_chaffin: Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad fo…
Reason-ModernColBERT achieves near-perfect results on BrowseComp-Plus, surpassing SOTA and models 54× larger, then Agent-ModernColBERT further improves with minimal training.
@raphaelsrty: We're releasing LateOn and DenseOn today. Two open retrieval models, 149M parameters each. LateOn (ColBERT, multi-vecto…
Raphael released two open-source retrieval models, LateOn (ColBERT multi-vector) and DenseOn (single-vector), each 149M parameters and outperforming 4× larger models on BEIR.
@AmelieTabatta: ColBERT models continue to embarrass models 54× their sizes , this is why we trust late interaction @LightOnIO . A 1-ye…
The article highlights how ColBERT models, despite being smaller and older, outperform larger models like Qwen3-embed-8B when coupled with late interaction techniques and minimal fine-tuning.
@bo_wangbo: okay maybe it's a good time? We have a small colbert model trained at pplx, it is a continue-training of pplx-embed-0.6…
Perplexity AI releases pplx-embed-v1-late-0.6b, a small ColBERT late-interaction embedding model for retrieval, fine-tuned from their existing embedding model and optimized for MaxSim scoring, now open-source on HuggingFace.
@eliebakouch: very nice release by @OpenAI! a 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion sc…
OpenAI released a 1.5B-parameter MoE model with only 50M active parameters that can filter private data from trillion-token datasets while maintaining 128k context length.