@stevibe: Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model …
Summary
A demonstration shows that Qwen3.6 35B A3B combined with NVIDIA's LocateAnything-3B as a vision tool can accurately fill out a paper form by detecting field positions, proving that small models can collaborate to accomplish tasks beyond a single large model's capability.
View Cached Full Text
Cached at: 06/02/26, 09:37 PM
Qwen3.6 35B A3B can’t fill out a paper form on its own. But give it NVIDIA’s LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool). I gave Qwen a new tool: ask “where’s the email field?” and LocateAnything returns the exact x, y, width, height. The blue boxes on the screen are its detections. Look how tight they are — it nails every field.
Result:
Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct. Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas. Character-box alignment still a touch loose, but every value is where it belongs. 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
Qwen alone can’t finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can. A combination of small models can do the work of a single large one.
LocateAnything-3B
Kept the vision, as it took screen captures to verify the results too.
Yeah, I actually started with the 9B, but its tool calling capabilities weren’t doing great, and the reasoning to put data on the fields was a bit off, so I finally switched to the 35B A3B.
I’d say it was more stable. The 35B A3B still has some repeated tool calls sometimes, but the 27B was just solid. The only problem is it’s slower than the 35B A3B on the same hardware.
Similar Articles
@stevibe: I explored a further possibility with local models: Qwen3.6 35B A3B + NVIDIA LocateAnything-3B as a local Computer Use …
Demonstration of a local computer use agent combining Qwen3.6 35B A3B and NVIDIA LocateAnything-3B models to perform tasks like switching Mac display modes via screenshots, without requiring accessibility APIs, running entirely on local hardware.
The Qwen 3.6 35B A3B hype is real!!!
The author benchmarks small local LLMs, highlighting Qwen 3.6 35B A3B for its superior ability to map academic code to research papers compared to models like Gemma 4 and Nemotron 3 Nano.
nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face
NVIDIA releases Qwen3.6-35B-A3B-NVFP4, a quantized version of Alibaba's mixture-of-experts multimodal language model, optimized for deployment on NVIDIA GPUs using Model Optimizer.
Qwen3.6 35Ba3 has changed my workflows and even how I use my computer
A user describes how Qwen3.6 35B, combined with the 'pi' tool, has transformed their computer workflows, allowing natural language control of the OS and automated task execution. They successfully built a landing page from voice messages entirely locally, demonstrating the model's practical utility.
Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.
A user shares impressive results running a quantized Qwen 3.6:35b-a3b model on a used RTX 3090, achieving 160 tokens per second output after fitting the model into VRAM, and demonstrates vision capabilities with a 75-second video processing time.