qwen

Tag

Cards List
#qwen

Is there any reason for a lack of love for Gemma 4 26b?

Reddit r/LocalLLaMA · 10h ago

A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.

0 favorites 0 likes
#qwen

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

Reddit r/LocalLLaMA · 18h ago

An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.

0 favorites 0 likes
#qwen

@BlackRainLabs: Using TurboQuant i was able to push 20 tk/s on qwen 3.6 35b MoE on a GTX1060 3GB. Insane for such a small and old card.…

X AI KOLs Following · 22h ago Cached

Using TurboQuant, the user achieved 20 tokens per second on a Qwen 3.6 35B MoE model running on a GTX1060 3GB, showcasing impressive performance on outdated hardware.

0 favorites 0 likes
#qwen

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

Reddit r/LocalLLaMA · yesterday

A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.

0 favorites 0 likes
#qwen

Qwen3.6-35B-A3B APEX on a Single RTX 3090 - Getting the Most Out of It

Reddit r/LocalLLaMA · yesterday

A detailed guide on running the Qwen3.6-35B-A3B APEX model on an RTX 3090, comparing two llama.cpp forks and quantization methods for optimal speed and quality.

0 favorites 0 likes
#qwen

@karminski3: Thinking of buying a Mac to run large models? This is a deterrent post. Actually, the estimation method is simple. Even if you buy a MacStudio to run the Qwen3.6-27B 4bit quantized version, then enable DFlash to use Qwen's built-in speculative decoding, it only reaches 65 token/s. And now most large models can run at 40 token/s…

X AI KOLs Timeline · yesterday Cached

The author calculates the token cost and break-even period of running large models on a Mac Studio, concluding that it is not cost-effective for ordinary users to buy a Mac for personal large model use, and suggests that using APIs or renting GPUs is more economical.

0 favorites 0 likes
#qwen

@guohao_li: yes, it is definitely time to seriously consider buying more GPUs and start building our own local ai stack. i’m curiou…

X AI KOLs Following · yesterday Cached

A researcher suggests it's time to buy more GPUs and build a local AI stack, referencing Qwen 3.5 27B and GLM 5.2 as models that cancel the threat of a permanent underclass.

0 favorites 0 likes
#qwen

We got local models to triage the OpenClaw repo for FREE!*

Hugging Face Blog · yesterday Cached

The blog post describes using local open-weight models like Gemma and Qwen in an agent harness to automatically triage issues and pull requests in the OpenClaw repository, enabling real-time notifications without relying on costly closed API models.

0 favorites 0 likes
#qwen

Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

Hacker News Top · yesterday Cached

A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.

0 favorites 0 likes
#qwen

@losterror501: with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...

X AI KOLs Timeline · yesterday Cached

Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).

0 favorites 0 likes
#qwen

A100 slow Qwen3.6-27B-FP8

Reddit r/LocalLLaMA · yesterday

The Qwen3.6-27B-FP8 model exhibits slow performance when running on an A100 GPU.

0 favorites 0 likes
#qwen

Qwen 27B for planning, Qwen 35B-A3B for execution?

Reddit r/LocalLLaMA · yesterday

Discusses using Qwen 27B for planning tasks and Qwen 35B-A3B for execution tasks, suggesting a specialized model approach.

0 favorites 0 likes
#qwen

Best local model for vision - 2nd benchmark update - 21 Jun 2026

Reddit r/LocalLLaMA · yesterday

This post presents the second update of a benchmark for local vision language models, comparing 23 models across 30 images with revised settings, and provides performance recommendations for different VRAM tiers. Key findings include that thinking mode hurts vision performance and that MoE models underperform dense models for perception tasks.

0 favorites 0 likes
#qwen

Qwen 3.6 27b Abliterated (apostate)

Reddit r/LocalLLaMA · yesterday

The user released Apostate, an abliterated version of Qwen 3.6 27B that reduces safety alignment refusal rate from 92% to 7.6% with minimal capability loss (KL 0.120).

0 favorites 0 likes
#qwen

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

Reddit r/LocalLLaMA · 2d ago

Technical report on running Qwen 3.6 27B Q8 model on a dual AMD Radeon R9700 setup using llama.cpp with ROCm, including performance benchmarks and configuration details.

0 favorites 0 likes
#qwen

Qwen is never going to open source Qwen 3.7, aren't they?

Reddit r/LocalLLaMA · 2d ago

After firing Junyang Lin, Qwen has locked down its large models and is no longer releasing open source models, while other Chinese AI labs continue to open source their latest models. Rumors suggest the small model team is gone and Qwen 3.6/3.7 may be the last open source models.

0 favorites 0 likes
#qwen

Qwen code companion on vscode marketplace - thoughts

Reddit r/LocalLLaMA · 2d ago

Qwen code companion is now available on the VS Code marketplace, offering an AI-powered coding assistant for developers.

0 favorites 0 likes
#qwen

Best Settings for 48GB VRAM + Qwen 3.6 27B

Reddit r/LocalLLaMA · 3d ago

A user shares optimized settings for running Qwen3.6 27B (Q8_0) on a dual GPU setup (RTX 4090 + RTX 3090) with llama.cpp, achieving 75-100 t/s and 1500 pp with 250k context.

0 favorites 0 likes
#qwen

@SlimTradeyBaby: Drop your GPU below and I’ll tell you exactly what model and config to run on it. JOKES. No need. Qwen 3.6 27b @Unsloth…

X AI KOLs Timeline · 3d ago Cached

A tweet promoting the Qwen 3.6 27b model and recommending UnslothAI for running it on any GPU.

0 favorites 0 likes
#qwen

@LottoLabs: This is awesome work Dflash for qwen 3.5/6 series

X AI KOLs Timeline · 3d ago Cached

Charles Frye announces the co-release with Z Lab of six new DFlash speculators for Alibaba Qwen 3.x models, achieving over 1k output tokens per second for Qwen 3.5 122B-A10B on a B200.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback