qwen

#qwen

Qwen 3.6 27b Abliterated (apostate)

Reddit r/LocalLLaMA ↗ · 3d ago

The user released Apostate, an abliterated version of Qwen 3.6 27B that reduces safety alignment refusal rate from 92% to 7.6% with minimal capability loss (KL 0.120).

0 favorites 0 likes

#qwen

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

Reddit r/LocalLLaMA ↗ · 3d ago

Technical report on running Qwen 3.6 27B Q8 model on a dual AMD Radeon R9700 setup using llama.cpp with ROCm, including performance benchmarks and configuration details.

0 favorites 0 likes

#qwen

Qwen is never going to open source Qwen 3.7, aren't they?

Reddit r/LocalLLaMA ↗ · 3d ago

After firing Junyang Lin, Qwen has locked down its large models and is no longer releasing open source models, while other Chinese AI labs continue to open source their latest models. Rumors suggest the small model team is gone and Qwen 3.6/3.7 may be the last open source models.

0 favorites 0 likes

#qwen

Qwen code companion on vscode marketplace - thoughts

Reddit r/LocalLLaMA ↗ · 4d ago

Qwen code companion is now available on the VS Code marketplace, offering an AI-powered coding assistant for developers.

0 favorites 0 likes

#qwen

Best Settings for 48GB VRAM + Qwen 3.6 27B

Reddit r/LocalLLaMA ↗ · 4d ago

A user shares optimized settings for running Qwen3.6 27B (Q8_0) on a dual GPU setup (RTX 4090 + RTX 3090) with llama.cpp, achieving 75-100 t/s and 1500 pp with 250k context.

0 favorites 0 likes

#qwen

@SlimTradeyBaby: Drop your GPU below and I’ll tell you exactly what model and config to run on it. JOKES. No need. Qwen 3.6 27b @Unsloth…

X AI KOLs Timeline ↗ · 5d ago Cached

A tweet promoting the Qwen 3.6 27b model and recommending UnslothAI for running it on any GPU.

0 favorites 0 likes

#qwen

@LottoLabs: This is awesome work Dflash for qwen 3.5/6 series

X AI KOLs Timeline ↗ · 5d ago Cached

Charles Frye announces the co-release with Z Lab of six new DFlash speculators for Alibaba Qwen 3.x models, achieving over 1k output tokens per second for Qwen 3.5 122B-A10B on a B200.

0 favorites 0 likes

#qwen

@charles_irl: Speculation Is All You Need. In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFla…

X AI KOLs Following ↗ · 5d ago Cached

Modal and Z Lab release six new DFlash speculative decoding draft models for Qwen 3.x, achieving over 1000 tokens per second on a B200 and arguing that speculative decoding is the most impactful inference optimization.

0 favorites 0 likes

#qwen

$1800 (in GPU cost running with P2P running Qwen/Qwen3.6-27b-FP8 with 262K context and BF16 KV cache at 55 tok/s

Reddit r/LocalLLaMA ↗ · 5d ago

A user shares a configuration of 4x RTX 5060 Ti 16GB with P2P to run Qwen3.6-27B-FP8 at 55 tok/s with 262K context, highlighting the low cost of about $1800 for single-user inference.

0 favorites 0 likes

#qwen

empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF

Hugging Face Models Trending ↗ · 5d ago Cached

Empero AI releases Qwythos-9B-Claude-Mythos-5-1M-GGUF, a 9B parameter reasoning model fine-tuned on 500M+ tokens of Claude Mythos/Fable traces with chain-of-thought, achieving significant gains over Qwen3.5-9B and supporting 1M-token context via YaRN rope-scaling. The GGUF quantizations enable local inference on llama.cpp and compatible runtimes.

0 favorites 0 likes

#qwen

What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6?

Reddit r/LocalLLaMA ↗ · 5d ago

Compares the improvements from GLM 5.1 to 5.2 and Qwen 3.5 to 3.6, discussing which update is more impressive.

0 favorites 0 likes

#qwen

@ben_burtenshaw: https://x.com/ben_burtenshaw/status/2067615361428545566

X AI KOLs Timeline ↗ · 6d ago Cached

A detailed tutorial on supervised fine-tuning (SFT) for training AI agents, built from scratch in pure PyTorch using Qwen3-0.6B, explaining the mechanics of next-token prediction and label masking.

0 favorites 0 likes

#qwen

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable

Reddit r/LocalLLaMA ↗ · 6d ago

NVFP4 KV cache quantization on sm120 significantly improves memory efficiency for large language models, enabling 32GB VRAM systems to achieve ~60 tok/sec inference at 196k context size with Qwen3.6-27B.

0 favorites 0 likes

#qwen

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

arXiv cs.LG ↗ · 2026-06-18 Cached

Proposes a structural pruning framework for MoE models that maximizes channel-score coverage via attribution-based approximation, achieving 50% or 25% pruning with 4-bit quantization and reducing memory footprint by 5.27x on Qwen3-30B-A3B.

0 favorites 0 likes

#qwen

Local Qwen isn't a worse Opus, it's a different tool

Lobsters Hottest ↗ · 2026-06-18 Cached

Alex Ellis compares local Qwen models to cloud-based Claude Opus, sharing his experience using local AI in his software business. He highlights the practical value of local models for specific tasks while acknowledging their limitations, such as hallucination and infinite loops when quantized.

0 favorites 0 likes

#qwen

@yibie: Using Local Models as Primary Coding Tools: A Practical Report from Mid-2026 There was a post on Hacker News with a straightforward title: "Is anyone using local models as their primary coding tool?" 197 comments, incredibly dense with information. A dozen real users discussed their daily configurations, pitfalls they encountered, and why they still choose local models even though they know they're not as good as...

X AI KOLs Timeline ↗ · 2026-06-18 Cached

This article summarizes practical experiences from a Hacker News discussion about using local models (mainly Qwen 3.6 35B-A3B) as primary coding tools, including configurations, effectiveness (approximately 50-75% of frontier models), key techniques (such as preserve_thinking), and different user positions.

0 favorites 0 likes

#qwen

@LangChain: Fine-tuning open models can exceed or match frontier models. Base @Alibaba_Qwen out of the box w/ good prompting: Stron…

X AI KOLs Following ↗ · 2026-06-17 Cached

Fine-tuning open models like Alibaba's Qwen with LoRA can match or exceed frontier model performance on error classification tasks.

0 favorites 0 likes

#qwen

@ItsmeAjayKV: Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl). Crossed 90 t/s for the very first time, no MTP yet, prefill sp…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

A user reports achieving over 90 tokens per second inference speed with Qwen 3.6-35b-a3b MoE model on an RTX 3090 using llama.cpp, with prefill speeds exceeding 1000 t/s, indicating practical local deployment of large language models on consumer hardware.

0 favorites 0 likes

#qwen

@ItsmeAjayKV: Achievement Unlocked: Running Qwen3.6-27b dense Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

User benchmarks Qwen3.6-27B on an RTX 3090 using llama.cpp, achieving 35 tok/s generation and 1247 tok/s prompt processing.

0 favorites 0 likes

#qwen

@cjzafir: A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance. Same performance as: > De…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

VibeThinker, a 3B parameter model fine-tuned on Qwen 2.5, achieves performance comparable to Claude Opus 4.5 and much larger models like DeepSeek v3 through innovative post-training that includes multi-path thinking and staged training on math, coding, and science.

0 favorites 0 likes

qwen

Submit Feedback