Tag
Introducing Apodex, a self-evolving heavy-duty solver that uses a verification-centric agent team architecture for in-depth research. It supports self-solving, evidence chain verification, and more. Currently in early access and completely free.
ScaffoldAgent introduces a utility-guided dynamic outline optimization framework for open-ended deep research, using expansion, contraction, and revision operations to improve long-form report generation and factual grounding.
MetaResearcher proposes a framework for training deep research agents using self-reflective reinforcement learning in adversarial virtual environments, addressing limitations of static environments and fact-retrieval-only tasks.
Researchers trained a Deep Research agent using 32 H100 GPUs and open-sourced all components, enabling community access and further development.
MosaicLeaks introduces a new benchmark for measuring privacy leakage in deep-research AI agents, showing that agents often leak private information through external queries and proposing a training method (PA-DR) to reduce leakage while improving task performance.
Researchers from Boston Children's Hospital, Harvard, and OpenAI used the OpenAI o3 Deep Research reasoning model to reanalyze 376 unsolved rare disease cases, leading to diagnoses in 18 additional cases (4.8% yield) after expert review and clinical confirmation. The study, published in NEJM AI, demonstrates how AI-assisted workflows can help experts revisit difficult cases as scientific knowledge evolves.
OpenAI highlights how o3 Deep Research can aid rare disease diagnosis by integrating clinical features, inheritance patterns, variant evidence, and scientific literature into actionable hypotheses for specialists.
Apodex releases Apodex-1.0, a deep-research model that uses a heavy-duty agent team with global verification, achieving state-of-the-art results on multiple benchmarks including BrowseComp, DeepSearchQA, and HLE.
The article analyzes the problem of AI-generated writing that often appears correct but actually contains errors, and introduces a workflow using Deep Research tools (such as Apodex) to break down problems, find evidence, check risks, and finally write, helping creators improve content quality.
A small team trained a frontier-level Deep Research Agent on an academic budget using only 32 H100s and 8K synthetic samples, releasing fully open weights, code, and paper for models from 2B to 35B that match or beat closed frontier agents on key benchmarks.
Yu Su's team trained a frontier Deep Research Agent on an academic budget using 8K synthetic samples and RL, releasing fully open training infrastructure and models from 2B to 35B parameters.
Apodex 1.0 is a self-evolving AI system post-trained on Qwen3.5, achieving SOTA on BrowseComp, DeepSearchQA, and HLE-text. Its 4B mini model outperforms 30B-class models, with an AgentOS runtime for task orchestration. Open weights available.
This paper introduces PhySciBench, a benchmark of 200 expert-curated questions for physical sciences, and DelveAgent, a multi-agent framework that improves accuracy and reduces inference costs compared to baselines like Gemini Deep Research.
Introduces XBCP (Cross-lingual BrowseComp-Plus), a benchmark for evaluating deep research agents and retrievers in cross-lingual and multilingual settings. Results show significant performance degradation when evidence is in a different language from the query, highlighting both retrieval failures and agent-side difficulty in integrating language-mismatched evidence.
This paper introduces S1-DeepResearch-32B, an open-source model and 15K trajectory dataset for deep research agents, achieving state-of-the-art performance across 20 benchmarks by jointly modeling information acquisition, knowledge synthesis, and planning.
This paper proposes the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which uses hybrid-mode reinforcement learning to evolve a proposer, solver, and judge collaboratively for deep research tasks, achieving state-of-the-art results with an 8B model surpassing larger static models.
The author shares an approach to vetting library health using a deep research agent, discovering that the most valuable signal is when the agent flags disagreements among its sources rather than producing polished, false-confidence summaries. Apodex notably surfaced contradictions clearly, making it easier to adjudicate trust.
Tavily announces its Deep Research API, a single endpoint that performs multi-step research end-to-end and returns structured, source-cited reports. The API supports custom files, output schemas, and configurable research modes.
ApodexAI releases Apodex-1.0, a deep-research model that operates as a tool-using ReAct agent. Its heavy-duty mode, Apodex-1.0-H, uses an asynchronous agent team with up to 150 sub-agents and achieves new state-of-the-art results on deep-research benchmarks including BrowseComp, DeepSearchQA, HLE, and FrontierScience, surpassing models like GPT-5.5-pro and Claude-Opus-4.8.
Apodex 1.0 is a heavy-duty AI agent team for deep research that achieves state-of-the-art performance by searching the web, reasoning over evidence, and producing reports with verifiable evidence chains.