qwen

#qwen

@ItsmeAjayKV: Update on 3090: Now with Qwen 3.6-35b-a3b moe (q6_k_xl). Crossed 90 t/s for the very first time, no MTP yet, prefill sp…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

A user reports achieving over 90 tokens per second inference speed with Qwen 3.6-35b-a3b MoE model on an RTX 3090 using llama.cpp, with prefill speeds exceeding 1000 t/s, indicating practical local deployment of large language models on consumer hardware.

0 favorites 0 likes

#qwen

@ItsmeAjayKV: Achievement Unlocked: Running Qwen3.6-27b dense Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

User benchmarks Qwen3.6-27B on an RTX 3090 using llama.cpp, achieving 35 tok/s generation and 1247 tok/s prompt processing.

0 favorites 0 likes

#qwen

@cjzafir: A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance. Same performance as: > De…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

VibeThinker, a 3B parameter model fine-tuned on Qwen 2.5, achieves performance comparable to Claude Opus 4.5 and much larger models like DeepSeek v3 through innovative post-training that includes multi-path thinking and staged training on math, coding, and science.

0 favorites 0 likes

#qwen

@witcheer: this is the first Qwen3.6-27B coding tune I've measured that improves real bug-fixing (!!!). - quality (MMLU/ARC/HellaS…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

A community fine-tune of Qwen3.6-27B improves real bug-fixing on SWE-bench while maintaining quality, unlike synthetic distillations that regress.

0 favorites 0 likes

#qwen

SIQ-1 Qwen3.6 for autoresearch and autonomous agency

Reddit r/LocalLLaMA ↗ · 2026-06-17

SIQ-1 Qwen3.6 is a new AI model designed for automated research and autonomous agency tasks, extending the Qwen family with enhanced agentic capabilities.

0 favorites 0 likes

#qwen

Local models went from mostly useless to actually useful really fast. What changed?

Reddit r/LocalLLaMA ↗ · 2026-06-17

The post notes that local AI models have become significantly more useful over the past year, moving from toys to practical tools for coding and workflows, despite still lagging behind closed models for complex tasks.

0 favorites 0 likes

#qwen

It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding

Reddit r/LocalLLaMA ↗ · 2026-06-17

An investigation reveals that the Rio 3.5 397B AI model, funded with $100K, was likely a simple merge of Nex N2 Pro without any training, leading to accusations of funding embezzlement.

0 favorites 0 likes

#qwen

@MiaAI_lab: MTP is up, test it out https://huggingface.co/Mia-AiLab/Qwable-3.6-27b-MTP…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

Mia-AiLab releases Qwable-3.6-27b-MTP, a full fine-tuned checkpoint of Qwen3.6-27B using a cleaned Fable 5 reasoning and instruction dataset, focused on code, structured reasoning, and local inference with MTP layers.

0 favorites 0 likes

#qwen

@Ali_TongyiLab: We are pleased to highlight an excellent community model from developer : Qwen3.6-27B-MTP-pi-reasoning-GGUF. Built on o…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

Alibaba's Tongyi Lab highlights a community model, Qwen3.6-27B-MTP-pi-reasoning-GGUF, built on Qwen3.6-27B, optimized for automated programming and debugging workflows for local coding agents.

0 favorites 0 likes

#qwen

@WaleedAhmad1a10: Check out the Qwen 3.5 27B MoQ GGUFs :

X AI KOLs Following ↗ · 2026-06-16 Cached

A Hugging Face repository (kaitchup/Qwen3.6-27B-GGUF-MoQ) provides GGUF quantized weights for the Qwen3.6-27B MoQ model, enabling local inference with tools like llama.cpp and Ollama.

0 favorites 0 likes

#qwen

Quoting Georgi Gerganov

Simon Willison's Blog ↗ · 2026-06-16 Cached

Georgi Gerganov attests that Qwen3.6-27B is a very capable local coding model, which he uses daily on his M2 Ultra or RTX 5090 with a lightweight harness.

0 favorites 0 likes

#qwen

Qwen-Robot Suite: A Foundation Model Suite for Physical World Intelligence

Hacker News Top ↗ · 2026-06-16

Qwen-Robot Suite is a foundation model suite designed for physical world intelligence, enabling robots to understand and interact with the real world effectively.

0 favorites 0 likes

#qwen

Be wary of Qwen/Claude distillations - they're often worse than the base model

Reddit r/LocalLLaMA ↗ · 2026-06-16

A critical analysis warning that many Qwen/Claude distillation models use too few training samples (e.g., 4K) to transfer actual capabilities, often degrading quality instead of improving it, compared to official distills like DeepSeek-R1 which used ~700K samples.

0 favorites 0 likes

#qwen

How do you teach an agent your company's knowledge without fine-tuning?

Reddit r/AI_Agents ↗ · 2026-06-16

A developer building a multi-agent operations system for a logistics company discusses the challenge of giving agents institutional knowledge without fine-tuning, opting for a retrieval layer with human-in-the-loop approval.

0 favorites 0 likes

#qwen

Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.

0 favorites 0 likes

#qwen

DFlash and Spec V2 Decoding (14 minute read)

TLDR AI ↗ · 2026-06-16 Cached

Z Lab, SGLang, and Modal release DFlash, a new speculative decoding model for Qwen 3.5 397B-A17B that uses block diffusion and KV injection to achieve over 4x throughput improvement over baseline and 1.5x over native MTP.

0 favorites 0 likes

#qwen

Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B

Reddit r/LocalLLaMA ↗ · 2026-06-15

Discusses the cheapest hardware options for running Qwen 3.6 models, comparing RTX 3090 and Tesla V100 GPUs, and provides a detailed cost breakdown for a system at around $2000.

0 favorites 0 likes

#qwen

@modal: We worked with @lmsysorg and http://z-lab.ai to - integrate DFlash spec into @sgl_project - make it faster with overlap…

X AI KOLs Following ↗ · 2026-06-15 Cached

Modal collaborated with LMSys and Z Lab to integrate DFlash speculative decoding into SGLang, achieving up to 4.3x throughput improvement over baseline and 1.5x over native multi-token prediction for large language models.

0 favorites 0 likes

#qwen

How to Copy My Own Writing Style

Reddit r/LocalLLaMA ↗ · 2026-06-15

User asks whether providing a sample of their writing style to a local LLM is more effective in the conversation or in the system prompt.

0 favorites 0 likes

#qwen

Mia-AiLab/Qwable-3.6-27b

Hugging Face Models Trending ↗ · 2026-06-15 Cached

Mia-AiLab releases Qwable-3.6-27b, a full fine-tuned checkpoint of Qwen3.6-27B on a cleaned reasoning and instruction dataset, optimized for coding, technical assistance, and structured responses.

0 favorites 0 likes

qwen

Submit Feedback