Tag
TextGen, formerly text-generation-webui, has been updated to a native, no-install desktop application for Windows, Linux, and macOS, offering enhanced privacy, ik_llama.cpp support, and native tool-calling capabilities as an open-source alternative to LM Studio.
The article introduces Needle, a 26M parameter model by Cactus-Compute designed for single-shot tool calling, arguing that tool routing should be separated from reasoning as a structured prediction task to improve agent efficiency and latency.
This paper argues that full-horizon planning with lazy replanning is more efficient than step-by-step execution for data-centric LLM agent tasks, using fewer tokens while maintaining accuracy.
This paper introduces Switchcraft, the first AI model router specifically optimized for agentic tool calling to reduce inference costs. By using a lightweight DistilBERT classifier, it achieves significant cost savings while maintaining high accuracy in tool-use tasks.
The paper introduces MIST, a synthetic dataset and framework for training multimodal voice assistants to control IoT devices in smart homes. It highlights significant performance gaps between open and closed-weight models in handling complex, speech-based tool-calling tasks.
This Apple research paper introduces 'Reinforced Agent,' a method that moves evaluation into the execution loop using a specialized reviewer agent to correct tool-calling errors in real-time. It demonstrates significant accuracy improvements on benchmarks like BFCL and τ²-Bench without retraining the base agent.
Rhys Sullivan is building Executor, an open-source integration layer for AI agents that provides a unified tool catalog with access controls, approval flows for destructive actions, and support for MCP, OpenAPI, GraphQL, and more. It aims to standardize tool calling across different agents like Cursor and Claude Code.
BioTool introduces a comprehensive biomedical tool-calling dataset with 34 tools and 7,040 human-verified query-API pairs, enabling fine-tuned LLMs to outperform GPT-5.1 on biomedical tool use and significantly enhance answer quality.
Stanford professor released a free 1-hour lecture covering the fundamentals of AI agents, tool calling, multi-step workflows, planning and reflection.
IBM releases Granite-4.1-8B, an Apache 2.0 licensed 8B parameter long-context instruct model with enhanced tool-calling and multilingual support.
Moonshot has open-sourced the Kimi K2.6 model, supporting 4,000 tool calls in a single session and 300 parallel sub-agents, achieving SOTA on benchmarks like SWE-Bench Pro and claiming performance on par with Claude Opus 4.6 and GPT-5.4.
PolicyBank proposes a memory mechanism that enables LLM agents to autonomously refine their understanding of organizational policies through iterative interaction and corrective feedback, closing specification gaps that cause systematic behavioral divergence from true requirements. The work introduces a systematic testbed and demonstrates PolicyBank can close up to 82% of policy-gap alignment failures, significantly outperforming existing memory mechanisms.
OpenAI announced new tools and features for the Responses API, including support for remote Model Context Protocol (MCP) servers, image generation, Code Interpreter, and improved file search capabilities. The update also enables o3 and o4-mini models to call tools directly within their chain-of-thought, with new enterprise features like background mode and encrypted reasoning items.