Tag
ContextSniper is a token-efficient code memory layer for repository-level program repair using LLM agents. It reduces token usage by up to 51.5% and cost by up to 36.4% while maintaining similar resolution rates on SWE-bench Lite.
This paper proposes a retrieval-grounded small language model framework that uses formal concept analysis as a symbolic verification loop for ontology construction, demonstrating its effectiveness in a rare ataxia setting.
This paper proposes a causal auditing framework to evaluate forgetting in Limited Memory Language Models by varying the database state during inference, discovering that parametric leakage is negligible and post-deletion correctness primarily arises from retrieval artifacts rather than residual parametric memory.
This paper presents HistoriQA-ThirdRepublic, a French-language multi-hop question answering dataset derived from historical documents of the French Third Republic, designed to evaluate retrieval-augmented and LLM systems in historical research contexts.
This paper presents a five-arm ablation methodology for diagnosing which component of retrieval-warmed energy-based reasoning (RW-EBR) drives performance gains, applied to structured reasoning tasks like graph reachability and Sudoku. The method separates effects of class-prior bias, stochastic warm-starting, and graph-aligned value reuse.
This paper from SJTU and Tsinghua systematically evaluates 12 agent memory systems from a data management perspective, decomposing memory into four modules and providing guidelines on when to use RAG, vector databases, or knowledge graphs for long-term agent memory.
This paper introduces a retrieval-augmented personalization method for wearable stress detection using frozen foundation models, achieving near-supervised fine-tuning performance without requiring labeled user data.
This paper introduces RASC+, a retrieval-constrained LLM adjudication method for clinical value set authoring that improves candidate-pool recall and selection precision over prior RASC baselines, demonstrating that blinded LLM adjudication with Qwen3-based retrieval significantly outperforms direct generation.
ScaffoldAgent introduces a utility-guided dynamic outline optimization framework for open-ended deep research, using expansion, contraction, and revision operations to improve long-form report generation and factual grounding.
Proposes Multi-Agent Transactive Memory (MATM), a framework for population-level storage and retrieval of agent-generated trajectories to improve task performance and reduce interaction steps in interactive environments like ALFWorld and WebArena.
Introduces SkillWeaver, a decompose-retrieve-compose framework for routing multiple skills to LLM agents, along with CompSkillBench, a benchmark of 300 compositional queries over 2,209 real MCP server skills.
A research paper proposing a unified agentic-retrieval framework for autonomous context-aware data quality assessment. It interprets natural-language usage descriptions, generates executable validation logic via multi-agent workflow, and uses feasibility validation to ensure reliability.
This paper introduces DRIVE, a unified Transformer-based framework for offline auto-bidding that decouples candidate action generation from decision making, combining distributional action modeling, retrieval-augmented candidate generation, and value-based evaluation to improve bidding performance under budget and cost constraints.
This paper introduces a retrieval-augmented vision-language-action policy that eliminates per-task fine-tuning by using pre-trained models with indexed demonstrations, enabling efficient cross-embodiment generalization and task adaptation at test time.
Describes improving agentic memory search by incorporating grep-based exact matching alongside vector embeddings, inspired by a paper; achieved significant recall gains in their memory layer.
This paper introduces PersonaDrive, a pipeline that conditions a vision-language-action (VLA) driving agent on retrieved demonstrations from a style-instructed human driving dataset, enabling style-diverse non-ego agents for closed-loop simulation and improving driving scores on Bench2Drive.
This paper introduces Engram, an open-source bi-temporal memory engine for LLM agents that retrieves a compact context slice (∼9.6k tokens) to outperform the full-history baseline (79k tokens) by 10.4 accuracy points on LongMemEval, using a hybrid read path fusing dense, lexical, graph, and temporal signals.
A community discussion on agent memory reveals that while various patches exist for what to write down (e.g., plain files, layered memory, post-mortems), the unsolved problem is what to keep—detecting failures is tractable, but deciding which lessons persist still needs human judgment.
This paper introduces a four-condition diagnostic protocol to separate no-evidence answerability, oracle-evidence recoverability, full-context utilization, and retrieval-conditioned utilization in long-context and retrieval-augmented language models, tested on five open-weight models across multiple datasets.
QueryAgent-R1 is an agentic framework that bridges query generation and product retrieval in e-commerce using reinforcement learning and memory abstraction, improving query CTR by 2.9% and CVR by 3.1% in online tests.