qwen3.6

#qwen3.6

qwen3.6 just stops

Reddit r/LocalLLaMA ↗ · 2d ago

A user reports an issue where the Qwen 3.6 model stops mid-task when served via vLLM with specific Docker and speculative decoding configurations.

0 favorites 0 likes

#qwen3.6

unsloth/Qwen3.6-35B-A3B-MTP-GGUF

Hugging Face Models Trending ↗ · 4d ago Cached

This article announces the release of the Qwen3.6-35B-A3B model weights on Hugging Face, optimized by Unsloth with Multi-Token Prediction (MTP) for faster generation via llama.cpp. It highlights improvements in agentic coding capabilities, tool calling, and reasoning context preservation.

0 favorites 0 likes

#qwen3.6

Qwen 3.6 is actually useful for vibe-coding, and way cheaper than Claude

Reddit r/LocalLLaMA ↗ · 2026-04-23

User demonstrates Qwen 3.6 27B/35B running locally with llama-server cuts Claude Code API costs from $142 to <$4 for 8-hour vibe-coding session, achieving 30-day payback on $4500 dual-RTX 3090 rig.

0 favorites 0 likes

#qwen3.6

What speed is everyone getting on Qwen3.6 27b?

Reddit r/LocalLLaMA ↗ · 2026-04-22

User benchmarks Qwen3.6-27B-Q8_0 at ~13 tokens/sec on 3 mixed GPUs with 128k context via llama.cpp, asking if performance is typical.

0 favorites 0 likes

#qwen3.6

Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein (neuron level surgery)

Reddit r/LocalLLaMA ↗ · 2026-04-22

Community member repaired dead neurons in Qwen3.6-35B-A3B MoE by copying weights from healthy neighbors, releasing fixed GGUF and FP8 safetensors versions.

0 favorites 0 likes

#qwen3.6

Doing real coding work locally for the first time

Reddit r/LocalLLaMA ↗ · 2026-04-21

Developer achieves productive local agentic coding with Qwen3.6-35B 4-bit MLX and pi.dev tool, completing real tickets efficiently on current hardware.

0 favorites 0 likes

#qwen3.6

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into

Reddit r/LocalLLaMA ↗ · 2026-04-21

Author shares a working llama-server config to run the 35B-MoE Qwen3.6 model on an 8GB RTX 4060, highlighting a max_tokens trap caused by unconstrained internal reasoning and the fix using per-request thinking_budget_tokens.

0 favorites 0 likes

#qwen3.6

hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

Hugging Face Models Trending ↗ · 2026-04-18 Cached

A 35B-parameter Qwen3.6 model fine-tuned with Claude-Opus-style chain-of-thought distillation data and released in GGUF quantized formats for efficient local inference.

0 favorites 0 likes

qwen3.6

Submit Feedback