research-agents

#research-agents

@omarsar0: NEW favorite artifact. I read this every morning to catch up on AI news from high-signal X accounts. It's an HTML artif…

X AI KOLs Following ↗ · 3d ago Cached

A researcher shares a daily automation that curates AI news from high-signal X accounts into an HTML artifact using X MCP tools and research agents.

0 favorites 0 likes

#research-agents

WANDR Benchmark: Evaluating Research Agents That Must Search Wide and Deep (15 minute read)

TLDR AI ↗ · 6d ago Cached

Perplexity releases WANDR, an open benchmark and evaluation harness for research agents, consisting of 500 realistic data-collection tasks that require both wide discovery and deep verification. Initial results show even the strongest systems achieve low scores, highlighting that wide-and-deep research remains a challenging open problem.

0 favorites 0 likes

#research-agents

IdeaTrail: Full-Process Agent Trajectories for Scientific Ideation

arXiv cs.AI ↗ · 2026-07-14 Cached

IdeaTrail is a dataset of multi-turn process trajectories for scientific ideation, synthesizing research processes from evidence gathering to proposal construction using a Generator–Advisor loop to ensure grounding.

0 favorites 0 likes

#research-agents

From Solvers to Research: Large Language Model-Driven Formal Mathematics at the Research Frontier

arXiv cs.CL ↗ · 2026-07-10 Cached

This position paper reviews the current state of LLM-driven formal mathematics, identifies key limitations in applying these systems to open-ended research mathematics, and proposes a strategic roadmap for developing AI agents capable of advancing mathematical frontiers.

0 favorites 0 likes

#research-agents

@Xudong07452910: Nowadays, many people talk about Research Agents, with the default expectation being: read papers, find gaps, come up with ideas, run experiments, write papers. But this paper from Yale University asks a deeper question: How far apart are LLM-generated research ideas from the paper ideas that human researchers actually produce…

X AI KOLs Timeline ↗ · 2026-07-09 Cached

A paper from Yale University built a large-scale evaluation framework to compare the distribution gap between LLMs and human researchers in generating research ideas. It found that LLM ideas are highly concentrated in bridge and synthesis types, while human ideas are more broadly distributed. This reveals differences in 'research taste' and poses a challenge to the diversity of Research Agents.

0 favorites 0 likes

#research-agents

Search Discipline for Long-Horizon Research Agents

arXiv cs.AI ↗ · 2026-06-11 Cached

This paper identifies a failure mode in long-horizon research agents where optimizing an aggregate metric can select candidates that improve the headline number but break critical subgroups (inversion). It proposes a search-discipline protocol with an external control loop that audits candidates based on disaggregated behavior rather than the score.

0 favorites 0 likes

#research-agents

@k_dense_ai: Introducing Science Superpowers — a complete computational-science methodology for AI research agents. It makes your ag…

X AI KOLs Timeline ↗ · 2026-05-28 Cached

Science Superpowers is an open-source computational-science methodology for AI research agents, enforcing pre-registration and reproducible workflows to prevent p-hacking and HARKing.

0 favorites 0 likes

#research-agents

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

arXiv cs.AI ↗ · 2026-05-27 Cached

ScientistOne introduces Chain-of-Evidence, a verifiability framework for autonomous research agents that ensures every claim is traceable to evidence, achieving zero hallucinated references, perfect score verification, and the highest method-code alignment across 75 papers while matching or exceeding human expert performance on frontier research tasks.

0 favorites 0 likes

#research-agents

@_avichawla: The No. 1 deep researcher beats Claude and ChatGPT with a trick neither uses. I studied the open-source architecture be…

X AI KOLs Timeline ↗ · 2026-05-25 Cached

The Onyx open-source deep research system achieves top ranking by stripping search access from its orchestrator agent, forcing it to decompose queries into focused research threads. Its three-phase pipeline and two-level architecture prevent information distortion and premature answering, outperforming proprietary solutions from OpenAI, Anthropic, and Google.

0 favorites 0 likes

#research-agents

Product Integrations

Reddit r/AI_Agents ↗ · 2026-05-24

NineLayer, an MCP-based search engine for coding and research agents, has improved latency from 40s to 1.5s and is seeking user input on which platform integrations to prioritize.

0 favorites 0 likes

#research-agents

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper introduces REFLECT, a meta-evaluation benchmark for assessing the reliability of LLM judges in evaluating deep research agents. Experiments show current LLM judges remain unreliable, with overall accuracies below 55% across reasoning, tool-use, and report-quality failures.

0 favorites 0 likes

research-agents

Submit Feedback