Tag
Fine-tuning open models like Alibaba's Qwen with LoRA can match or exceed frontier model performance on error classification tasks.
A user reports achieving over 90 tokens per second inference speed with Qwen 3.6-35b-a3b MoE model on an RTX 3090 using llama.cpp, with prefill speeds exceeding 1000 t/s, indicating practical local deployment of large language models on consumer hardware.
User benchmarks Qwen3.6-27B on an RTX 3090 using llama.cpp, achieving 35 tok/s generation and 1247 tok/s prompt processing.
VibeThinker, a 3B parameter model fine-tuned on Qwen 2.5, achieves performance comparable to Claude Opus 4.5 and much larger models like DeepSeek v3 through innovative post-training that includes multi-path thinking and staged training on math, coding, and science.
A community fine-tune of Qwen3.6-27B improves real bug-fixing on SWE-bench while maintaining quality, unlike synthetic distillations that regress.
SIQ-1 Qwen3.6 is a new AI model designed for automated research and autonomous agency tasks, extending the Qwen family with enhanced agentic capabilities.
The post notes that local AI models have become significantly more useful over the past year, moving from toys to practical tools for coding and workflows, despite still lagging behind closed models for complex tasks.
An investigation reveals that the Rio 3.5 397B AI model, funded with $100K, was likely a simple merge of Nex N2 Pro without any training, leading to accusations of funding embezzlement.
Mia-AiLab releases Qwable-3.6-27b-MTP, a full fine-tuned checkpoint of Qwen3.6-27B using a cleaned Fable 5 reasoning and instruction dataset, focused on code, structured reasoning, and local inference with MTP layers.
Alibaba's Tongyi Lab highlights a community model, Qwen3.6-27B-MTP-pi-reasoning-GGUF, built on Qwen3.6-27B, optimized for automated programming and debugging workflows for local coding agents.
A Hugging Face repository (kaitchup/Qwen3.6-27B-GGUF-MoQ) provides GGUF quantized weights for the Qwen3.6-27B MoQ model, enabling local inference with tools like llama.cpp and Ollama.
Georgi Gerganov attests that Qwen3.6-27B is a very capable local coding model, which he uses daily on his M2 Ultra or RTX 5090 with a lightweight harness.
Qwen-Robot Suite is a foundation model suite designed for physical world intelligence, enabling robots to understand and interact with the real world effectively.
A critical analysis warning that many Qwen/Claude distillation models use too few training samples (e.g., 4K) to transfer actual capabilities, often degrading quality instead of improving it, compared to official distills like DeepSeek-R1 which used ~700K samples.
A developer building a multi-agent operations system for a logistics company discusses the challenge of giving agents institutional knowledge without fine-tuning, opting for a retrieval layer with human-in-the-loop approval.
This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.
Z Lab, SGLang, and Modal release DFlash, a new speculative decoding model for Qwen 3.5 397B-A17B that uses block diffusion and KV injection to achieve over 4x throughput improvement over baseline and 1.5x over native MTP.
Discusses the cheapest hardware options for running Qwen 3.6 models, comparing RTX 3090 and Tesla V100 GPUs, and provides a detailed cost breakdown for a system at around $2000.
Modal collaborated with LMSys and Z Lab to integrate DFlash speculative decoding into SGLang, achieving up to 4.3x throughput improvement over baseline and 1.5x over native multi-token prediction for large language models.
User asks whether providing a sample of their writing style to a local LLM is more effective in the conversation or in the system prompt.