Can Qwen3.6-35B-A3B on an RTX 3060 Replace Google Vision for Receipt-to-JSON Extraction?

Reddit r/LocalLLaMA 06/26/26, 09:14 PM News

local-ai vision-language-model receipt-extraction qwen rtx-3060 document-extraction self-hosted

Summary

A developer shares their experience using a local Qwen VL model on an RTX 3060 to parse Japanese receipts into JSON, replacing Google Vision, with results showing accurate extraction of key fields at ~31 seconds per receipt.

I tried replacing Google Vision in my receipt pipeline with a local Qwen model. I had an old LINE message bot where I could send a receipt photo, it would go to Google Vision, get parsed into JSON, and saved in SQLite. Recently I tried again, but locally. Setup: RTX 3060 12GB llama.cpp Qwen3.6-35B-A3B 12GB-target GGUF quant Paperless-ngx for uploading receipt images output goes to JSON / SQLite It worked pretty well. On around 30 Japanese receipts, the fields I actually care about were consistently right: store date subtotal tax total Speed was not great, but fine for this use case: ~31.75s per receipt ~11.06 GiB peak VRAM I wrote the details here: https://rafaelviana.com/article/qwen-receipt Is anyone else using local VLMs for boring document extraction stuff? Receipts, invoices, forms, etc.

Original Article

Similar Articles

Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.

Reddit r/artificial

A user shares impressive results running a quantized Qwen 3.6:35b-a3b model on a used RTX 3090, achieving 160 tokens per second output after fitting the model into VRAM, and demonstrates vision capabilities with a 75-second video processing time.

Is Qwen3-VL-2B the only viable VLM for JSON extraction on a "potato"?

Reddit r/LocalLLaMA

The author claims Qwen3-VL-2B is the only viable vision-language model for JSON extraction on low-end hardware, outperforming larger models like Qwen3-VL-4B, yet it is absent from major benchmarks.

Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Reddit r/LocalLLaMA

User reports surprisingly usable coding performance from Qwen3-27B-UD-Q6_K_XL.gguf running locally on RTX 5090 at ~50 tok/s with 200K context, marking a significant leap in local model quality.

@ItsmeAjayKV: Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl). Crossed 90 t/s for the very first time, no MTP yet, prefill sp…

X AI KOLs Timeline

A user reports achieving over 90 tokens per second inference speed with Qwen 3.6-35b-a3b MoE model on an RTX 3090 using llama.cpp, with prefill speeds exceeding 1000 t/s, indicating practical local deployment of large language models on consumer hardware.

Running Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB) — what worked, what didn't, and a surprising speculative-decoding result

Reddit r/LocalLLaMA

A detailed account of running the Qwen3.6-35B-A3B MoE model on an 8GB laptop GPU, covering effective optimizations like --no-mmap and VRAM headroom, unexpected findings where speculative decoding improved speed by 26% contrary to benchmarks, and pitfalls with Windows and CPU bottlenecks.

Similar Articles

Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.

Is Qwen3-VL-2B the only viable VLM for JSON extraction on a "potato"?

Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

@ItsmeAjayKV: Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl). Crossed 90 t/s for the very first time, no MTP yet, prefill sp…

Running Qwen3.6-35B-A3B on a laptop RTX 4060 (8GB) — what worked, what didn't, and a surprising speculative-decoding result

Submit Feedback