@cryptoresetlife: Models without restrictions are so fun haha. Among local LLM models, my current favorite is this Qwen3.6 35B A3B, distilled with Opus 4.7 and no censorship.

X AI KOLs Timeline 06/05/26, 03:47 AM Models

Summary

User shares their fondness for the local LLM model Qwen3.6 35B A3B, which is distilled with Opus 4.7 and has no censorship restrictions.

Models without restrictions are so fun haha 😁 My current favorite among local LLM models is this one Qwen3.6 35B A3B distilled with Opus 4.7 and no censorship https://t.co/j29SCkIsdD

Original Article

View Cached Full Text

Cached at: 06/05/26, 09:10 AM

Unrestricted models are way more fun haha 😁

Currently my favorite among local LLMs is this one: Qwen3.6 35B A3B distilled with Opus 4.7, no censorship https://t.co/j29SCkIsdD

Similar Articles

@zhixianio: After receiving the new machine, I began an 'ascetic' practice of forcing myself to use local models for common tasks. I thought it would be painful, but both speed and quality greatly exceeded my expectations: Model: Qwen3.6-35B-A3B-oQ6-fp16-mtp, Running: oMLX, with N…

X AI KOLs Timeline

The author uses the Qwen3.6-35B-A3B model and oMLX tool on the new local machine for daily tasks, finding that both speed and quality far exceed expectations, even outperforming remote LLMs in PA and coding scenarios, demonstrating a significant improvement in on-device AI capabilities.

@KtAIFeed: Straight to the point, no fluff. The recently popular Qwen 3.6 (35B/43B) latest open-source 'uncensored' model on Hugging Face (over a million downloads per month) can run locally with just 6GB VRAM on a single GPU. It completely breaks the original model's moral preaching and safety restrictions—no censorship, it will answer whatever you ask...

X AI KOLs Timeline

Introduces the Qwen 3.6 (35B/43B) open-source uncensored model, removing official moral and safety restrictions. Requires only 6GB VRAM for local operation. Over a million downloads.

@NFTCPS: 4GB VRAM running 70B large model? It actually works! AirLLM did a clever trick — layered inference, not loading the whole model into VRAM at once, but layer by layer, compute and discard, squeezing the giant into a small GPU. The best part: 100% open source, freebie warning https://github.com/0xSo…

X AI KOLs Timeline

AirLLM is a fully open-source tool that uses layered inference (loading and releasing VRAM layer by layer) to enable 70B large language models to run on GPUs with only 4GB VRAM, without quantization, distillation, or pruning. It already supports running Llama3.1 405B on 8GB VRAM.

@hank_aibtc: https://x.com/ClementDelangue/status/2058672394865111544/video/1… Local LLM speed ceiling broken again! llama.cpp natively supports MTP (Multi-Token Prediction): - No extra draft model needed…

X AI KOLs Timeline

llama.cpp natively supports Multi-Token Prediction (MTP) without requiring an extra draft model. By leveraging the model's built-in prediction head, local models like Qwen3.6-27B achieve 1.7x+ speedup, making 27B models run smoothly on consumer GPUs.

@victormustar: llama.cpp with MTP support makes local models fast enough to use as daily drivers Qwen3.6-27B dense generation (on A10G…

X AI KOLs Following

llama.cpp adds MTP support for Qwen3.6 models, boosting generation speed by 78% on A10G hardware, making local models viable as daily drivers.

Similar Articles

@zhixianio: After receiving the new machine, I began an 'ascetic' practice of forcing myself to use local models for common tasks. I thought it would be painful, but both speed and quality greatly exceeded my expectations: Model: Qwen3.6-35B-A3B-oQ6-fp16-mtp, Running: oMLX, with N…

@hank_aibtc: https://x.com/ClementDelangue/status/2058672394865111544/video/1… Local LLM speed ceiling broken again! llama.cpp natively supports MTP (Multi-Token Prediction): - No extra draft model needed…

@victormustar: llama.cpp with MTP support makes local models fast enough to use as daily drivers Qwen3.6-27B dense generation (on A10G…

Submit Feedback