tool-calling

#tool-calling

@RhysSullivan: I'm now building Executor full time as a startup! The state of tool calling is a mess: - Everyone is using different ag…

X AI KOLs Timeline ↗ · 14h ago Cached

Rhys Sullivan is building Executor, an open-source integration layer for AI agents that provides a unified tool catalog with access controls, approval flows for destructive actions, and support for MCP, OpenAPI, GraphQL, and more. It aims to standardize tool calling across different agents like Cursor and Claude Code.

0 favorites 0 likes

#tool-calling

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

arXiv cs.CL ↗ · yesterday Cached

BioTool introduces a comprehensive biomedical tool-calling dataset with 34 tools and 7,040 human-verified query-API pairs, enabling fine-tuned LLMs to outperform GPT-5.1 on biomedical tool use and significantly enhance answer quality.

0 favorites 0 likes

#tool-calling

@codewithimanshu: Stanford professor just gave away the entire foundation of how AI Agents & automation actually works. 1-hour lecture. T…

X AI KOLs Timeline ↗ · 2026-04-22 Cached

Stanford professor released a free 1-hour lecture covering the fundamentals of AI agents, tool calling, multi-step workflows, planning and reflection.

0 favorites 0 likes

#tool-calling

ibm-granite/granite-4.1-8b · Hugging Face

Reddit r/LocalLLaMA ↗ · 2026-04-21 Cached

IBM releases Granite-4.1-8B, an Apache 2.0 licensed 8B parameter long-context instruct model with enhanced tool-calling and multilingual support.

0 favorites 0 likes

#tool-calling

@KKaWSB: Moonshot just open-sourced Kimi K2.6—4,000 tool calls in one 12-hour session, 300 sub-agents in parallel building a full codebase. SOTA on SWE-Bench Pro, BrowseComp, HLE and more, ties Claude Opus 4.6 and G…

X AI KOLs Timeline ↗ · 2026-04-20 Cached

Moonshot has open-sourced the Kimi K2.6 model, supporting 4,000 tool calls in a single session and 300 parallel sub-agents, achieving SOTA on benchmarks like SWE-Bench Pro and claiming performance on par with Claude Opus 4.6 and GPT-5.4.

0 favorites 0 likes

#tool-calling

PolicyBank: Evolving Policy Understanding for LLM Agents

arXiv cs.CL ↗ · 2026-04-20 Cached

PolicyBank proposes a memory mechanism that enables LLM agents to autonomously refine their understanding of organizational policies through iterative interaction and corrective feedback, closing specification gaps that cause systematic behavioral divergence from true requirements. The work introduces a systematic testbed and demonstrates PolicyBank can close up to 82% of policy-gap alignment failures, significantly outperforming existing memory mechanisms.

0 favorites 0 likes

#tool-calling

New tools and features in the Responses API

OpenAI Blog ↗ · 2025-05-21 Cached

OpenAI announced new tools and features for the Responses API, including support for remote Model Context Protocol (MCP) servers, image generation, Code Interpreter, and improved file search capabilities. The update also enables o3 and o4-mini models to call tools directly within their chain-of-thought, with new enterprise features like background mode and encrypted reasoning items.

0 favorites 0 likes

tool-calling

@RhysSullivan: I'm now building Executor full time as a startup! The state of tool calling is a mess: - Everyone is using different ag…

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

@codewithimanshu: Stanford professor just gave away the entire foundation of how AI Agents & automation actually works. 1-hour lecture. T…

ibm-granite/granite-4.1-8b · Hugging Face

@KKaWSB: Moonshot just open-sourced Kimi K2.6—4,000 tool calls in one 12-hour session, 300 sub-agents in parallel building a full codebase. SOTA on SWE-Bench Pro, BrowseComp, HLE and more, ties Claude Opus 4.6 and G…

PolicyBank: Evolving Policy Understanding for LLM Agents

New tools and features in the Responses API

Submit Feedback