Qwen3.6 35B-A3B on a Laptop: My Zero to One Moment

Reddit r/LocalLLaMA 06/07/26, 03:13 PM News

local-ai qwen3-5 laptop inference personal-experience privacy open-source

Summary

The author shares their experience running Qwen3.6 35B-A3B locally on an ASUS Zenbook Pro 14, achieving 27 TPS at 32k context, marking a personal milestone towards fully local AI for privacy.

Hi everyone, I'm new here - because I only have a laptop and I only just realized local models are actually good enough now. So I'd like to share my experience, in case it helps others, and also to learn from the more experienced people here. This is the first model that works for me on my ASUS Zenbook Pro 14 (RTX 4060 8GB VRAM, 64GB RAM): * fast enough: \~27TPS generation speed at 32k context, or \~18TPS at 256k context * smart enough: it can read and write files, use skills, execute CLI commands, use git, follow instructions, and act as a useful thinking partner. **Why it's important to me** For me this is important because it's where I unconsciously decided to draw the line - that I didn't want to share private information or more personal thoughts with cloud models (even TEE ones). I know I can still get hacked and my data leaked, but for me that's different than giving it up from the first prompt. So for the first time, I now have this fully local, second brain. For me, it's a game changer. **I still use cloud models for public stuff** I'm still using cloud models for public projects, but for brainstorming and simple personal projects, local is now good enough for me. I'm also now looking into a more powerful desktop machine where maybe I can do some more serious coding. I have had a taste and I want more 😄 Now whenever I see Claude's black box "✽ Envisioning… (41s · ↓ 2.9k tokens · thinking some more with high effort)" it's so frustrating. I have no idea if it's going in the right direction. (whether this is an "efficient" way to do things is another story) **My issues so far with Qwen3.6** Qwen3.6 35B A3B is not perfect, here are some minor issues I observed, which I can work around: * It makes some mistakes, but normally recovers on its own. * Very occasionally it does get stuck in a loop. It does need some human monitoring, which is fine for me. * It sometimes doesn't read a skill in full or make the best decision even when it can fit it in context. It seems to sometimes be "lazy". * It is very non-deterministic. I didn't do any tweaks here though (because normally it ends up with the result I need). I guess some of these could be improved if I used a larger quantization. **My setup** For inference I use llama.cpp, with unsloth's Qwen3.6-35B-A3B-UD-IQ3\_XXS.gguf. For my harness, I use Pi with pi-llama-cpp extension. The harness runs in multipass and connects to the host running llama.cpp. I've also connected it to my phone through an E2EE Matrix chat (a custom one I built off of pi-messenger-bridge) - although it means I have to keep my laptop on all the time, which is annoying. Another reason for buying another machine which I'm more comfortable to run 24/7. **llama.cpp flags for 256k context(18tps):** `./build/bin/llama-server -m Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf -ngl 24 -np 1 -fa on -ctk q4_0 -ctv q4_0 -c 262144 --host` [`0.0.0.0`](http://0.0.0.0) `--port 8088 -ncmoe 32 --no-mmap --jinja` **llama.cpp flags for the 32k context (27tps):** `./build/bin/llama-server -m Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf -ngl 99 -np 1 -fa on -ctk q4_0 -ctv q4_0 -c 32000 --host` [`0.0.0.0`](http://0.0.0.0) `--port 8088 -ncmoe 32 --no-mmap --jinja` *What was your Zero to One moment?*

Original Article

Qwen3.6 35B-A3B on a Laptop: My Zero to One Moment

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

@rohanpaul_ai: Qwen 3.6 27B on a MacBook Pro M5 Max 64GB hitting 34tokens per sec, locally with atomic[.]chat 90% acceptance rate, i.e…

@remilouf: Following @julien_c’s tweet I bought a MacBook Pro with 128B unified memory, and started running Qwen3.6 as my daily dr…

Running Qwen 3.6 35B MoE (Q4_K_M) on a Zeus (Xiaomi 12 Pro, 12GB RAM)

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

Submit Feedback

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

@rohanpaul_ai: Qwen 3.6 27B on a MacBook Pro M5 Max 64GB hitting 34tokens per sec, locally with atomic[.]chat 90% acceptance rate, i.e…

@remilouf: Following @julien_c’s tweet I bought a MacBook Pro with 128B unified memory, and started running Qwen3.6 as my daily dr…

Running Qwen 3.6 35B MoE (Q4_K_M) on a Zeus (Xiaomi 12 Pro, 12GB RAM)

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer