@cryptoresetlife: Models without restrictions are so fun haha. Among local LLM models, my current favorite is this Qwen3.6 35B A3B, distilled with Opus 4.7 and no censorship.
Summary
User shares their fondness for the local LLM model Qwen3.6 35B A3B, which is distilled with Opus 4.7 and has no censorship restrictions.
View Cached Full Text
Cached at: 06/05/26, 09:10 AM
Unrestricted models are way more fun haha 😁
Currently my favorite among local LLMs is this one: Qwen3.6 35B A3B distilled with Opus 4.7, no censorship https://t.co/j29SCkIsdD
Similar Articles
@zhixianio: After receiving the new machine, I began an 'ascetic' practice of forcing myself to use local models for common tasks. I thought it would be painful, but both speed and quality greatly exceeded my expectations: Model: Qwen3.6-35B-A3B-oQ6-fp16-mtp, Running: oMLX, with N…
The author uses the Qwen3.6-35B-A3B model and oMLX tool on the new local machine for daily tasks, finding that both speed and quality far exceed expectations, even outperforming remote LLMs in PA and coding scenarios, demonstrating a significant improvement in on-device AI capabilities.
@KtAIFeed: Straight to the point, no fluff. The recently popular Qwen 3.6 (35B/43B) latest open-source 'uncensored' model on Hugging Face (over a million downloads per month) can run locally with just 6GB VRAM on a single GPU. It completely breaks the original model's moral preaching and safety restrictions—no censorship, it will answer whatever you ask...
Introduces the Qwen 3.6 (35B/43B) open-source uncensored model, removing official moral and safety restrictions. Requires only 6GB VRAM for local operation. Over a million downloads.
@NFTCPS: 4GB VRAM running 70B large model? It actually works! AirLLM did a clever trick — layered inference, not loading the whole model into VRAM at once, but layer by layer, compute and discard, squeezing the giant into a small GPU. The best part: 100% open source, freebie warning https://github.com/0xSo…
AirLLM is a fully open-source tool that uses layered inference (loading and releasing VRAM layer by layer) to enable 70B large language models to run on GPUs with only 4GB VRAM, without quantization, distillation, or pruning. It already supports running Llama3.1 405B on 8GB VRAM.
@hank_aibtc: https://x.com/ClementDelangue/status/2058672394865111544/video/1… Local LLM speed ceiling broken again! llama.cpp natively supports MTP (Multi-Token Prediction): - No extra draft model needed…
llama.cpp natively supports Multi-Token Prediction (MTP) without requiring an extra draft model. By leveraging the model's built-in prediction head, local models like Qwen3.6-27B achieve 1.7x+ speedup, making 27B models run smoothly on consumer GPUs.
@victormustar: llama.cpp with MTP support makes local models fast enough to use as daily drivers Qwen3.6-27B dense generation (on A10G…
llama.cpp adds MTP support for Qwen3.6 models, boosting generation speed by 78% on A10G hardware, making local models viable as daily drivers.