llm-agents

#llm-agents

Meet Tiro! Agentic assisted memory retrieval and session state memory module.

Reddit r/AI_Agents ↗ · 2h ago

The author introduces Tiro, an open-source agentic memory and retrieval framework designed to solve long-term context drift in LLM agents by providing modular, inspectable memory lanes for sessions, documents, and operational state.

0 favorites 0 likes

#llm-agents

@dylan_works_: Wrote up something fun I’ve been poking at: when LLM agents repeatedly rewrite their own experiences into textual “less…

X AI KOLs Timeline ↗ · 12h ago Cached

This research blog post demonstrates that repeatedly rewriting LLM agent experiences into textual 'lessons' often degrades performance rather than improving it. The author finds that episodic memory retention performs better than abstract consolidation across various benchmarks like ARC-AGI and ALFWorld.

0 favorites 0 likes

#llm-agents

@shao__meng: The Internal Design, Iteration, and Maintenance of Agent Skills at Perplexity. The public version of Perplexity Agents' internal standards presents a counter-intuitive core argument: writing a Skill is not about writing code, but about building context for the model. Applying the instinct of engineers writing code directly to Skills...

X AI KOLs Timeline ↗ · 16h ago Cached

The Perplexity team has published guidelines for the design, iteration, and maintenance of Agent Skills, emphasizing that writing Skills is not traditional coding but rather constructing context for the model. The article proposes a counter-intuitive methodology focused on evaluation-first approaches, progressive loading, and optimizing Agent behavior by handling edge cases (Gotchas).

0 favorites 0 likes

#llm-agents

@DeRonin_: Here's how I'm running automated content engine in 2 files 1 markdown file = my wiki 1 html file = my dashboard that's …

X AI KOLs Following ↗ · 18h ago

The article outlines a method for creating a personalized automated content engine using a single Markdown file for data and an HTML dashboard powered by Claude agents to replace paid SaaS tools.

0 favorites 0 likes

#llm-agents

@apurvasgandhi: Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer ha…

X AI KOLs Timeline ↗ · yesterday

RAO (Recursive Agent Optimization) is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves, turning recursive inference into a learned capability.

0 favorites 0 likes

#llm-agents

@QingQ77: 30 runnable Jupyter notebooks that thoroughly cover LLM agent memory technologies from short-term to long-term, simple to production-grade. https://github.com/NirDiamant/Agent_Memory_Techniques… This repo covers L...

X AI KOLs Timeline ↗ · yesterday Cached

A GitHub repository containing 30 runnable Jupyter notebooks that comprehensively explain LLM agent memory technologies, from short-term context to production-grade patterns, covering methods like MemGPT, Zep, Graphiti, along with decision trees and comparison tables.

0 favorites 0 likes

#llm-agents

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

arXiv cs.AI ↗ · 2d ago Cached

This paper introduces SkillRet, a large-scale benchmark for evaluating skill retrieval in LLM agents, addressing the challenge of selecting relevant skills from large libraries. It provides a dataset of over 17,000 skills and demonstrates that task-specific fine-tuning significantly improves retrieval performance.

0 favorites 0 likes

#llm-agents

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

arXiv cs.AI ↗ · 2d ago Cached

This paper challenges the assumption that adding more scaffolding components to LLM agents always improves performance, demonstrating through systematic experiments that cross-component interference often leads to degradation. The study finds that simpler, task-specific subsets of components frequently outperform fully equipped 'all-in' agents across various model scales.

0 favorites 0 likes

#llm-agents

Belief Memory: Agent Memory Under Partial Observability

arXiv cs.AI ↗ · 2d ago Cached

This paper introduces BeliefMem, a novel memory paradigm for LLM agents that stores multiple candidate conclusions with probabilities to handle partial observability and reduce self-reinforcing errors. Empirical evaluations show it outperforms deterministic baselines on LoCoMo and ALFWorld benchmarks.

0 favorites 0 likes

#llm-agents

Agentic Discovery of Exchange-Correlation Density Functionals

arXiv cs.AI ↗ · 2d ago Cached

This paper presents an agentic system using Large Language Models to automate the discovery of exchange-correlation functionals in Density Functional Theory, achieving improvements over human-designed baselines while highlighting challenges with benchmark overfitting.

0 favorites 0 likes

#llm-agents

From History to State: Constant-Context Skill Learning for LLM Agents

arXiv cs.AI ↗ · 2d ago Cached

This paper introduces 'constant-context skill learning,' a framework that moves procedural knowledge from prompts into model weights to reduce token usage and improve privacy for LLM agents. The method achieves strong performance on benchmarks like ALFWorld and WebShop while significantly reducing inference costs.

0 favorites 0 likes

#llm-agents

MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents

arXiv cs.CL ↗ · 2d ago Cached

The article introduces MANTRA, a framework for automatically synthesizing SMT-validated compliance benchmarks for tool-using LLM agents from natural language manuals. It demonstrates that this approach enables scalable and reliable evaluation of agent adherence to complex procedural rules.

0 favorites 0 likes

#llm-agents

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Hugging Face Daily Papers ↗ · 3d ago Cached

StraTA proposes strategic trajectory abstraction for long-horizon LLM agents, using hierarchical GRPO-style rollout with diverse strategy sampling and critical self-judgment to improve sample efficiency and final performance over frontier models and prior RL baselines.

0 favorites 0 likes

#llm-agents

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Hugging Face Daily Papers ↗ · 3d ago Cached

Skill1 is a unified framework that trains a single policy to co-evolve skill selection, utilization, and distillation using a shared task-outcome objective. Experiments on ALFWorld and WebShop show it outperforms existing baselines in complex task environments.

0 favorites 0 likes

#llm-agents

SkillOS: Learning Skill Curation for Self-Evolving Agents

Hugging Face Daily Papers ↗ · 3d ago Cached

This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.

0 favorites 0 likes

#llm-agents

Vibe coding and agentic engineering are getting closer than I'd like

Simon Willison's Blog ↗ · 3d ago Cached

Simon Willison reflects on how vibe coding and agentic engineering are converging in his own workflow, raising concerns about code review responsibilities as AI coding agents like Claude Code become increasingly reliable. He explores the ethical tension between trusting AI-generated code in production and maintaining software engineering standards.

0 favorites 0 likes

#llm-agents

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Papers with Code Trending ↗ · 6d ago Cached

ARIS is an open-source research harness that uses cross-model adversarial collaboration to ensure reliable long-term research outcomes through coordinated execution, orchestration, and assurance layers.

0 favorites 0 likes

#llm-agents

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Hugging Face Daily Papers ↗ · 2026-05-03 Cached

The paper introduces Direct Corpus Interaction (DCI), a novel approach allowing AI agents to query raw text directly using standard terminal tools instead of traditional embedding-based retrieval. By bypassing fixed similarity interfaces and offline indexing, DCI significantly outperforms conventional sparse, dense, and reranking baselines across multiple IR and agentic search benchmarks.

0 favorites 0 likes

#llm-agents

LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans

arXiv cs.CL ↗ · 2026-04-23 Cached

Large-scale study finds LLM agents can predict individual social-media reactions with 70.7 % accuracy but still lag behind simple TF-IDF classifiers, highlighting both manipulation risks and policy-simulation utility.

0 favorites 0 likes

#llm-agents

@svpino: Open-source always finds a way! This is an open-source, local-first memory layer for LLM agents. It's for Mac users. Th…

X AI KOLs Following ↗ · 2026-04-22 Cached

An open-source, local-first memory layer for LLM agents on macOS that captures user activity and saves it as Markdown files.

0 favorites 0 likes

llm-agents

Submit Feedback