Needle: We Distilled Gemini Tool Calling Into a 26M Model
Summary
Cactus-Compute released Needle, a 26M parameter open-source model distilled from Gemini for efficient on-device function calling using a novel Simple Attention Network architecture without MLPs.
Similar Articles
Cactus-Compute/needle
Cactus-Compute releases Needle, a 26M parameter distilled model from Gemini 3.1, using a pure attention architecture optimized for on-device inference and local fine-tuning.
@sitinme: A 26M parameter model can do Function Call, and is even stronger than Qwen-0.6B? This team's out-of-the-box approach is too wild! Nowadays, large models have ever-growing parameter counts, but one question has never been seriously considered: does calling a tool really need hundreds of billions of parameters? Think about it, when you say 'Check today's...'
The Cactus team distilled Gemini 3.1 into a specialized model called Needle with only 26M parameters, specifically for Function Call. Its performance surpasses Qwen-0.6B, demonstrating the potential of small models in tool calling scenarios.
A 26M tool-router suggests tool calling should be split from reasoning
The article introduces Needle, a 26M parameter model by Cactus-Compute designed for single-shot tool calling, arguing that tool routing should be separated from reasoning as a structured prediction task to improve agent efficiency and latency.
Introducing the Gemini 2.5 Computer Use model
Google releases Gemini 2.5 Computer Use model via the Gemini API, enabling developers to build AI agents that can interact with user interfaces through clicking, typing, and scrolling. The model outperforms alternatives on web and mobile control benchmarks with lower latency and is available in preview on Google AI Studio and Vertex AI.
Benchmarked Needle 26M vs Qwen3-0.6B on CPU function calling, 50 queries across 5 difficulty tiers. The 23x smaller model wins on accuracy and is 4.4x faster.
A benchmark comparing Needle 26M and Qwen3-0.6B on CPU function calling shows the smaller Needle model wins in accuracy and speed, but with distinct failure modes: Needle picks the wrong tool while Qwen3 often fails to emit tool calls.