Tag
A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.
An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.
Using TurboQuant, the user achieved 20 tokens per second on a Qwen 3.6 35B MoE model running on a GTX1060 3GB, showcasing impressive performance on outdated hardware.
A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.
A detailed guide on running the Qwen3.6-35B-A3B APEX model on an RTX 3090, comparing two llama.cpp forks and quantization methods for optimal speed and quality.
The author calculates the token cost and break-even period of running large models on a Mac Studio, concluding that it is not cost-effective for ordinary users to buy a Mac for personal large model use, and suggests that using APIs or renting GPUs is more economical.
A researcher suggests it's time to buy more GPUs and build a local AI stack, referencing Qwen 3.5 27B and GLM 5.2 as models that cancel the threat of a permanent underclass.
The blog post describes using local open-weight models like Gemma and Qwen in an agent harness to automatically triage issues and pull requests in the OpenClaw repository, enabling real-time notifications without relying on costly closed API models.
A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.
Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).
The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.
Discusses using Qwen 27B for planning tasks and Qwen 35B-A3B for execution tasks, suggesting a specialized model approach.
This post presents the second update of a benchmark for local vision language models, comparing 23 models across 30 images with revised settings, and provides performance recommendations for different VRAM tiers. Key findings include that thinking mode hurts vision performance and that MoE models underperform dense models for perception tasks.
The user released Apostate, an abliterated version of Qwen 3.6 27B that reduces safety alignment refusal rate from 92% to 7.6% with minimal capability loss (KL 0.120).
Technical report on running Qwen 3.6 27B Q8 model on a dual AMD Radeon R9700 setup using llama.cpp with ROCm, including performance benchmarks and configuration details.
After firing Junyang Lin, Qwen has locked down its large models and is no longer releasing open source models, while other Chinese AI labs continue to open source their latest models. Rumors suggest the small model team is gone and Qwen 3.6/3.7 may be the last open source models.
Qwen code companion is now available on the VS Code marketplace, offering an AI-powered coding assistant for developers.
A user shares optimized settings for running Qwen3.6 27B (Q8_0) on a dual GPU setup (RTX 4090 + RTX 3090) with llama.cpp, achieving 75-100 t/s and 1500 pp with 250k context.
A tweet promoting the Qwen 3.6 27b model and recommending UnslothAI for running it on any GPU.
Charles Frye announces the co-release with Z Lab of six new DFlash speculators for Alibaba Qwen 3.x models, achieving over 1k output tokens per second for Qwen 3.5 122B-A10B on a B200.