Can Qwen3.6-35B-A3B on an RTX 3060 Replace Google Vision for Receipt-to-JSON Extraction?

Reddit r/LocalLLaMA News

Summary

A developer shares their experience using a local Qwen VL model on an RTX 3060 to parse Japanese receipts into JSON, replacing Google Vision, with results showing accurate extraction of key fields at ~31 seconds per receipt.

I tried replacing Google Vision in my receipt pipeline with a local Qwen model. I had an old LINE message bot where I could send a receipt photo, it would go to Google Vision, get parsed into JSON, and saved in SQLite. Recently I tried again, but locally. Setup: RTX 3060 12GB llama.cpp Qwen3.6-35B-A3B 12GB-target GGUF quant Paperless-ngx for uploading receipt images output goes to JSON / SQLite It worked pretty well. On around 30 Japanese receipts, the fields I actually care about were consistently right: store date subtotal tax total Speed was not great, but fine for this use case: ~31.75s per receipt ~11.06 GiB peak VRAM I wrote the details here: https://rafaelviana.com/article/qwen-receipt Is anyone else using local VLMs for boring document extraction stuff? Receipts, invoices, forms, etc.
Original Article

Similar Articles

Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.

Reddit r/artificial

A user shares impressive results running a quantized Qwen 3.6:35b-a3b model on a used RTX 3090, achieving 160 tokens per second output after fitting the model into VRAM, and demonstrates vision capabilities with a 75-second video processing time.