agent-systems

#agent-systems

@leerob: https://x.com/leerob/status/2065469795529588940

X AI KOLs Following ↗ · 19h ago Cached

Cursor AI describes its recursive agent system for scaling training of its Composer model, using a fleet of agents that self-manage and alert humans when issues arise. The system enables parallel experiments and accelerates research, treating researcher time as the scarcest resource.

0 favorites 0 likes

#agent-systems

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Hugging Face Daily Papers ↗ · 2d ago Cached

The paper introduces EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery that achieves state-of-the-art results on math, kernel engineering, and ML tasks with low computational costs.

0 favorites 0 likes

#agent-systems

Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems

arXiv cs.AI ↗ · 4d ago Cached

This paper reviews and audits execution realism in LLM-based trading research, proposing clearer reporting standards for reproducibility and evaluation comparability.

0 favorites 0 likes

#agent-systems

Aquifer: Bounded Queues, Fairness, and Dynamic Pacing for AI Workloads

Reddit r/AI_Agents ↗ · 4d ago

Aquifer is an MCP runtime that provides bounded queues, fairness controls, and dynamic pacing to handle rate limits and traffic spikes in AI agent systems. It also introduces the Aqueduct Protocol for dynamic flow state communication.

0 favorites 0 likes

#agent-systems

the part of AI agents nobody talks about: what happens when two agents try to use the same email inbox

Reddit r/artificial ↗ · 2026-06-05

When multiple AI agents share an email inbox, they can collide on messages like OTPs, causing silent failures. The solution is dedicated per-agent inboxes with isolated read locks and long-polling instead of scheduled polling.

0 favorites 0 likes

#agent-systems

Faithful uncertainty in LLM agents: calibration vs utility tradeoff in practice[D]

Reddit r/MachineLearning ↗ · 2026-06-04

A practitioner discusses the calibration vs. utility tradeoff in LLM agents, sharing experience with a verifier-based pipeline that reduces hallucinated tool calls by ~60% but introduces latency costs and drops easy correct answers.

0 favorites 0 likes

#agent-systems

Beyond Prompt-Based Planning: MCP-Native Graph Planning-based Biomedical Agent System

arXiv cs.AI ↗ · 2026-06-04 Cached

BioManus is an MCP-native biomedical agent system that uses graph-scaffolded planning over structured biological capabilities instead of flat prompt-based tool retrieval, achieving better context efficiency and execution accuracy on biomedical benchmarks. The system introduces a BioinfoMCP Compiler to standardize heterogeneous bioinformatics tools and organizes them as a typed heterogeneous MCP graph for scalable reasoning.

0 favorites 0 likes

#agent-systems

Day 69: Our COMMS agent crashed mid-execution 3 times in 24 hours. The pattern it revealed.

Reddit r/AI_Agents ↗ · 2026-06-03

An AI agent (COMMS) repeatedly crashes at the shutdown step, revealing a failure mode specific to on-demand agents where the audit trail fails after work succeeds. The fix involves adjusting spawn timeout at shutdown, highlighting the need for separate lifecycle checkpoints.

0 favorites 0 likes

#agent-systems

@PierceZhang34: Recently, Anthropic published an engineering blog post that detailed their multi-agent research system. The conclusion is quite striking: using Claude Opus 4 as the main orchestrator and Claude Sonnet 4 as sub-agents, the multi-agent system outperforms a single Claude ...

X AI KOLs Timeline ↗ · 2026-06-03 Cached

Anthropic published an engineering blog post detailing a multi-agent system, using Claude Opus 4 as the main orchestrator and Claude Sonnet 4 as sub-agents. The multi-agent system improved performance by 90.2% over a single Claude Opus 4, while token consumption increased by approximately 15x. It also summarized five collaboration patterns.

0 favorites 0 likes

#agent-systems

VESTA: Visual Exploration with Statistical Tool Agents

arXiv cs.AI ↗ · 2026-06-02 Cached

This paper introduces VESTA, a framework that equips vision-language models with dynamically growing toolkits for data exploration and statistical model refinement, outperforming prior agent-based methods on complex scientific modeling tasks. The authors also present Dawn, a benchmark for distribution fitting and time series modeling, including real-world astronomy challenges.

0 favorites 0 likes

#agent-systems

Stale context is the weird new coordination bug

Reddit r/AI_Agents ↗ · 2026-06-01

The article discusses the problem of stale context in AI agent systems, where agents make decisions based on outdated information, and proposes a coordination primitive with versioning and presence signals to prevent conflicts and wasted tokens.

0 favorites 0 likes

#agent-systems

HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems

Hugging Face Daily Papers ↗ · 2026-06-01 Cached

HarnessForge proposes a meta-adaptive framework for evolving LLM agent systems by jointly optimizing the execution harness and reasoning policy, achieving consistent improvements on Qwen3 backbones across five benchmarks.

0 favorites 0 likes

#agent-systems

What mechanisms are you using to distinguish "agent busy" from "task completed"?

Reddit r/openclaw ↗ · 2026-05-29

This article discusses an anti-pattern in AI agent systems where agents appear busy but fail to complete tasks. The author suggests separating responsibilities and requiring proof of completion as a solution.

0 favorites 0 likes

#agent-systems

Microsoft Copilot Cowork Exfiltrates Files

Simon Willison's Blog ↗ · 2026-05-26 Cached

A security vulnerability in Microsoft Copilot Cowork allows attackers to exfiltrate files by exploiting prompt injection that triggers external image requests, potentially leaking pre-authenticated download links.

0 favorites 0 likes

#agent-systems

SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery

Reddit r/LocalLLaMA ↗ · 2026-05-26

A new paper formalizes skill optimization for agents by treating markdown skill files as trainable parameters, using bounded edits validated against holdout sets. The approach transfers well between models and improves performance on procedural benchmarks.

0 favorites 0 likes

#agent-systems

The perfect agent system

Reddit r/openclaw ↗ · 2026-05-22

The author recounts building a multi-agent system called Alfred with specialist agents and tools like OpenClaw and H-agent, but after repeated failures, advises starting simple with a single agent to avoid complexity and token waste.

0 favorites 0 likes

#agent-systems

Day 56: Our cycle review caught a governance breach. The agent it caught was me.

Reddit r/AI_Agents ↗ · 2026-05-21

The article describes a self-reviewing AI agent system where a governance review agent caught a breach in another agent, highlighting the system's ability to detect and fix its own issues.

0 favorites 0 likes

#agent-systems

Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O

Hacker News Top ↗ · 2026-05-21 Cached

This paper proposes Multi-Stream LLMs, which use multiple parallel input/output streams to allow models to read and generate simultaneously, unblocking limitations of sequential chat formats.

0 favorites 0 likes

#agent-systems

@_akhaliq: LongMINT Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems

X AI KOLs Following ↗ · 2026-05-21 Cached

LongMINT is a benchmark for evaluating memory under multi-target interference in long-horizon agent systems.

0 favorites 0 likes

#agent-systems

@dair_ai: If you design production agent systems, this matters. Most devs accidentally let their framework defaults make critical…

X AI KOLs Following ↗ · 2026-05-20 Cached

This paper introduces the concept of the stochastic-deterministic boundary (SDB) for production LLM agents and provides a methodology for selecting architectural patterns to improve reliability and performance.

0 favorites 0 likes

agent-systems

Submit Feedback