llm-agents

Tag

Cards List
#llm-agents

@mdancho84: The Fundamentals Of Building Autonomous LLM Agents A 38-page PDF that uncovers the secrets of building AI agents that a…

X AI KOLs Timeline · 6h ago Cached

A tweet promoting a 38-page PDF guide on building autonomous LLM agents, offering a free resource for learning about agentic AI systems.

0 favorites 0 likes
#llm-agents

@RitOnchain: https://x.com/RitOnchain/status/2069693848478269730

X AI KOLs Timeline · 9h ago Cached

This article details how a systematic fund replaced its traditional NLP pipeline with a RAG-based LLM agent architecture, achieving a 340% improvement in alpha generation from unstructured data. It cites recent research (Alpha-GPT 2.0, FinCon, FinAgent) showing significant gains in automated factor discovery and trading performance.

0 favorites 0 likes
#llm-agents

@wquguru: https://x.com/wquguru/status/2069641926752780384

X AI KOLs Timeline · 13h ago Cached

This article comprehensively reviews the complete architectural layering of AI Agent Memory as of mid-2026, including rule files, persistent profiles, historical recall, and evidence chains. It explains the storage methods, loading timings, and governance principles of different memory layers, emphasizing the key role of memory in helping agents achieve cross-session compounding work.

0 favorites 0 likes
#llm-agents

MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

arXiv cs.CL · 13h ago Cached

MEMPROBE is a benchmark that evaluates long-term memory in LLM agents by reconstructing hidden user states from the agent's memory after interaction.

0 favorites 0 likes
#llm-agents

ReM-MoA: Reasoning Memory Sustains Mixture-of-Agents Scaling

arXiv cs.AI · 13h ago Cached

ReM-MoA introduces a memory-augmented Mixture-of-Agents framework that sustains scaling through ranked reasoning memory and curated diversified memory routing, outperforming prior MoA variants across five reasoning benchmarks.

0 favorites 0 likes
#llm-agents

LemonHarness Technical Report

arXiv cs.AI · 13h ago Cached

Presents LemonHarness, an integrated execution framework for long-horizon LLM agents that constrains state-changing operations within a clearly defined workspace, introduces a reusable rule knowledge base, and adds time-aware execution. Achieves 84-86% accuracy on Terminal-Bench 2.0.

0 favorites 0 likes
#llm-agents

Metis: Bridging Text and Code Memory for Self-Evolving Agents

arXiv cs.CL · 13h ago Cached

Metis presents a controlled study comparing text and code memory for self-evolving agents, finding they have complementary trade-offs. It proposes a hierarchical dual-representation memory system that improves task accuracy by up to 20.6% and reduces execution cost by up to 22.8% on the AppWorld benchmark.

0 favorites 0 likes
#llm-agents

The most reliable data agent I've shipped is ~90% deterministic code. The LLM just parses intent and talks. Change my mind.

Reddit r/AI_Agents · 23h ago

The author argues that the reliability of AI agents comes from deterministic code, not the LLM, and shares five key practices for building trustworthy agents on messy real-world data.

0 favorites 0 likes
#llm-agents

Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning

Hugging Face Daily Papers · yesterday Cached

This paper proposes the EDV framework, which uses multiple heterogeneous agents in execute-distill-verify stages to build reliable experiences for LLM agents, preventing self-confirmatory errors and improving performance on long-horizon benchmarks.

0 favorites 0 likes
#llm-agents

@cwolferesearch: I just published a blog on agentic RL that covers 10+ recent frameworks in the space. Here are the key takeaways… Link …

X AI KOLs Timeline · 2d ago Cached

A blog post summarizing ten recent agentic RL frameworks and best practices, covering modular interfaces, trajectory structure, action masks, process rewards, advantage normalization, scalable rollouts, stability/exploration, and task curriculum.

0 favorites 0 likes
#llm-agents

Same model, same prompt, 4 different agents

Reddit r/LocalLLaMA · 2d ago

Explores how different agent architectures yield varying outputs from the same underlying model and prompt, highlighting the impact of agent design on LLM behavior.

0 favorites 0 likes
#llm-agents

@rohanpaul_ai: Can LLM agents actually discover hidden rules by interacting? The answer is uncomfortable. The more complicated the hid…

X AI KOLs Following · 2d ago Cached

This paper investigates whether LLM agents can infer hidden world models through interaction, finding that they struggle to build stable internal models as complexity increases.

0 favorites 0 likes
#llm-agents

When Agents Commit Too Soon: Diagnosing Premature Commitment in LLM Agents

Hugging Face Daily Papers · 2d ago Cached

This paper introduces representational commitment, a cross-run hidden-state convergence that diagnoses when an LLM agent has locked onto a trajectory prematurely. It shows that commitment predicts trajectory consistency but not correctness, and proposes monitoring to detect when an agent is confidently settled rather than assuming consistency equals trust.

0 favorites 0 likes
#llm-agents

CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents

Hugging Face Daily Papers · 2d ago Cached

CLI-Universe is a synthesis engine that generates verifiable terminal-agent tasks via multi-dimensional capability taxonomy and evidence-guided research, producing a distilled dataset of 6,000 trajectories. Fine-tuning Qwen3-32B on this dataset achieves 33.4% on Terminal-Bench 2.0, setting a new state-of-the-art for open-source models at or below 32B parameters.

0 favorites 0 likes
#llm-agents

Libretto: Giving LLM Agents a Sense of Musical Structure

Hugging Face Daily Papers · 3d ago Cached

Libretto introduces a structured framework for symbolic music generation and revision using an LLM-native grammar and corpus-calibrated statistical evaluation across musical dimensions, enabling LLM agents to treat music as a measurable and editable object.

0 favorites 0 likes
#llm-agents

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Hugging Face Daily Papers · 3d ago Cached

PlanBench-XL is a new benchmark that evaluates LLM agents' ability to plan and adapt in large tool ecosystems with limited visibility and dynamic disruptions. Experiments show GPT-5.4 achieves only 51.9% accuracy in block-free settings and collapses to 11.36% under severe blocking, highlighting significant challenges in long-horizon planning.

0 favorites 0 likes
#llm-agents

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization for Open-Ended Deep Research

arXiv cs.AI · 4d ago Cached

ScaffoldAgent introduces a utility-guided dynamic outline optimization framework for open-ended deep research, using expansion, contraction, and revision operations to improve long-form report generation and factual grounding.

0 favorites 0 likes
#llm-agents

The Tao of Agency: Autotelic AI, Embedded Agency and Dissolution of the Self

arXiv cs.AI · 4d ago Cached

This paper explores autotelic AI, where agents generate their own goals, and discusses implications for intrinsic motivation, embeddedness, and the dissolution of the self boundary. It proposes a framework extending to quantum formulation, non-dual philosophy, and LLM-based instantiation.

0 favorites 0 likes
#llm-agents

Multi-Agent Transactive Memory

arXiv cs.AI · 4d ago Cached

Proposes Multi-Agent Transactive Memory (MATM), a framework for population-level storage and retrieval of agent-generated trajectories to improve task performance and reduce interaction steps in interactive environments like ALFWorld and WebArena.

0 favorites 0 likes
#llm-agents

Human-on-the-Loop Orchestration for AI-Assisted Legal Discovery

arXiv cs.AI · 4d ago Cached

This paper proposes a human-on-the-loop orchestration framework for AI-assisted legal discovery, introducing a taxonomy of agentic failures and a four-layer verification architecture to reduce privilege-waiver risk.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback