Tag
TabClaw is an open-source interactive AI agent for spreadsheet manipulation and table reasoning that uses LLMs to automate data analysis, support multi-table reasoning, and adapt to user preferences through memory and skill extraction.
FLaG is a lightweight framework for hallucination detection in LLMs that models correctness via latent evidence groups and energy-based routing, achieving SOTA performance across benchmarks.
This academic paper addresses the current critical state of cyberspace, likely discussing vulnerabilities, threats, and governance challenges.
This survey reframes the alignment tuning of large language models as a data pipeline design problem, decomposing it into three stages: response synthesis, preference evaluation, and preference instantiation. It identifies design trade-offs and failure modes, and outlines open challenges such as prompt-level alignment and agentic settings.
The user shared their experience of writing academic papers entirely using AI (DeepSeek R1 and V4), emphasizing that the Chinese outline and fine prompt tuning are key, and noting that manually editing AI-generated writing is more tiring than writing it themselves.
This paper introduces AgentAtlas, a framework that goes beyond outcome-only leaderboards for LLM agents by proposing a six-state control-decision taxonomy and a nine-category trajectory-failure taxonomy to evaluate agent behavior more comprehensively.
This paper evaluates four text chunking strategies for Retrieval-Augmented Generation on Khmer agricultural documents, finding that character-based Recursive chunking with 300 characters yields the best retrieval and relevance performance.
DeepSlide is a human-in-the-loop multi-agent system for the full presentation process, from requirement elicitation and time-budgeted narrative planning to evidence-grounded slide-script generation and rehearsal support. It introduces a dual-scoreboard benchmark separating static artifact quality from dynamic delivery excellence, and achieves gains in narrative flow, pacing precision, and slide-script synergy.
This paper introduces Sequential Agent Tuning (SAT), a coordinator-free training paradigm for multi-LLM teams that provides monotonic improvement guarantees and plug-and-play invariance, enabling smaller models to outperform larger ones.