@pcuenq: My data point: working on two projects in parallel with Pi + llama.cpp + Qwen-3.6-35B-A3B (I prefer the MoE ) This work…
Summary
A user reports successfully running parallel projects using Pi and llama.cpp with the Qwen-3.6-35B-A3B model on an older M1 Max machine, demonstrating practical usability.
Similar Articles
Running Qwen3.6-35B-A3B Locally for Coding Agent: My Setup & Working Config
A detailed guide for running the 35B-parameter Qwen3.6 model locally on Apple Silicon with llama.cpp to power the pi coding agent, including optimized configuration flags and sampling parameters.
@mitsuhiko: If you don't have a 128GB mac, I also have a pi-llamacpp extension that just configures 4 versions of Qwen 3.6. https:/…
mitsuhiko releases a pi-llamacpp extension that automates the setup and management of local LLM inference using llama.cpp, specifically supporting various quantized versions of the Qwen 3.6 model.
@_lewtun: You can now have an AI researcher running on your laptop 24/7 for free! Running Qwen3-35B-A3B with llama.cpp and a 4-bi…
The article highlights the ability to run Qwen3-35B-A3B locally on a laptop for free using llama.cpp and Unsloth 4-bit quantization.
More Qwen3.6-27B MTP success but on dual Mi50s
The article benchmarks the Qwen3.6-27B model using Multi-Token Prediction (MTP) and tensor parallelism on dual Mi50 GPUs, demonstrating significant speedups via llama.cpp.
@leopardracer: https://x.com/leopardracer/status/2055341758523883631
A user shares their experience setting up a dual-GPU local AI lab with RTX 4080 Super and 5060 Ti, running Qwen 3.6 models via llama.cpp and llama-swap to reduce API costs and enable unrestricted experimentation.