qwen

#qwen

Worse quality with MTP - Qwen 3.6, Gemma 4

Reddit r/LocalLLaMA ↗ · 7h ago

A user reports that MTP versions of Qwen 3.6 and Gemma 4 models produce lower quality outputs in code review tasks compared to non-MTP counterparts, with only marginal real-world speed improvements despite higher token generation rates.

0 favorites 0 likes

#qwen

Anthropic Accuses Alibaba's Qwen of Largest Claude Distillation

Reddit r/ArtificialInteligence ↗ · 14h ago

Anthropic alleged that Alibaba's Qwen lab used nearly 25,000 fake accounts to run 29 million Claude model exchanges, surpassing all prior distillation campaigns, prompting U.S. senators to consider legislation sanctioning Chinese firms for unauthorized access to AI model outputs.

0 favorites 0 likes

#qwen

@ms_aifrontiers: Fara1.5 is here! The tech report just landed on arXiv. New SOTA for computer use agents of its size, and it competes wi…

X AI KOLs Following ↗ · 20h ago Cached

Fara1.5 is a family of native computer use agents trained using the FaraGen1.5 scalable data pipeline. The models achieve new state-of-the-art results on browser-use benchmarks, competing with much larger frontier models.

0 favorites 0 likes

#qwen

@_akhaliq: paper:

X AI KOLs Following ↗ · 21h ago Cached

A critical thread analyzing the Qwen-AgentWorld paper, which proposes language world models for general agents. The critique raises concerns about simulator fidelity, benchmark design, and cost, rating it 4.5/10 on a bullshitometer.

0 favorites 0 likes

#qwen

Qwen-AgentWorld-35B-A3B for Coding?

Reddit r/LocalLLaMA ↗ · yesterday

Qwen-AgentWorld-35B-A3B is a new model variant from the Qwen series, specialized for coding tasks.

0 favorites 0 likes

#qwen

Qwen-AgentWorld-397B-A17B

Reddit r/LocalLLaMA ↗ · yesterday

Qwen released a new large language model, Qwen-AgentWorld-397B-A17B, as detailed on HuggingFace and the Qwen blog.

0 favorites 0 likes

#qwen

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

arXiv cs.CL ↗ · yesterday Cached

This paper examines the reliability of exact-match retrieval recall as a proxy for downstream policy classification performance in long-horizon tool-use agents. Experiments with Qwen2.5 classifiers on τ-bench show that low clause recall does not significantly degrade classifier accuracy, suggesting that retrieval metrics alone can mislead when evaluating policy signal.

0 favorites 0 likes

#qwen

@ModelScope2022: Qwen-AgentWorld just dropped two releases on ModelScope! An open 35B total / 3B active MoE world model with 256K contex…

X AI KOLs Timeline ↗ · yesterday Cached

Qwen-AgentWorld releases an open 35B total / 3B active MoE world model with 256K context, along with a 7-domain benchmark, achieving state-of-the-art performance on AgentWorldBench.

0 favorites 0 likes

#qwen

UPDATE: Qwen-27B-IQ4_KS and Qwen-27B-IQ_KS_KT for ik_llama.cpp, especially for NVIDIA with 16GB VRAM

Reddit r/LocalLLaMA ↗ · yesterday

New GGUF quantizations of Qwen3.6-27B optimized for 16GB VRAM NVIDIA GPUs, including an experimental Trellis variant, with perplexity benchmarks.

0 favorites 0 likes

#qwen

I mapped the KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT

Reddit r/LocalLLaMA ↗ · yesterday

The author maps the Kullback-Leibler divergence of KV cache quantization for the Qwen3.6-35B-A3B and Gemma4-E2B QAT models.

0 favorites 0 likes

#qwen

Is there any reason for a lack of love for Gemma 4 26b?

Reddit r/LocalLLaMA ↗ · 2d ago

A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.

0 favorites 0 likes

#qwen

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

Reddit r/LocalLLaMA ↗ · 2d ago

An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.

0 favorites 0 likes

#qwen

@BlackRainLabs: Using TurboQuant i was able to push 20 tk/s on qwen 3.6 35b MoE on a GTX1060 3GB. Insane for such a small and old card.…

X AI KOLs Following ↗ · 2d ago Cached

Using TurboQuant, the user achieved 20 tokens per second on a Qwen 3.6 35B MoE model running on a GTX1060 3GB, showcasing impressive performance on outdated hardware.

0 favorites 0 likes

#qwen

NEX-N2-mini: "There is no Pareto frontier. I am Pareto". This Qwen3.5-MoE fine tune fixed 3.5 and 3.6 overthinking apparently on my tests.

Reddit r/LocalLLaMA ↗ · 2d ago

A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.

0 favorites 0 likes

#qwen

Qwen3.6-35B-A3B APEX on a Single RTX 3090 - Getting the Most Out of It

Reddit r/LocalLLaMA ↗ · 3d ago

A detailed guide on running the Qwen3.6-35B-A3B APEX model on an RTX 3090, comparing two llama.cpp forks and quantization methods for optimal speed and quality.

0 favorites 0 likes

#qwen

@karminski3: Thinking of buying a Mac to run large models? This is a deterrent post. Actually, the estimation method is simple. Even if you buy a MacStudio to run the Qwen3.6-27B 4bit quantized version, then enable DFlash to use Qwen's built-in speculative decoding, it only reaches 65 token/s. And now most large models can run at 40 token/s…

X AI KOLs Timeline ↗ · 3d ago Cached

The author calculates the token cost and break-even period of running large models on a Mac Studio, concluding that it is not cost-effective for ordinary users to buy a Mac for personal large model use, and suggests that using APIs or renting GPUs is more economical.

0 favorites 0 likes

#qwen

@guohao_li: yes, it is definitely time to seriously consider buying more GPUs and start building our own local ai stack. i’m curiou…

X AI KOLs Following ↗ · 3d ago Cached

A researcher suggests it's time to buy more GPUs and build a local AI stack, referencing Qwen 3.5 27B and GLM 5.2 as models that cancel the threat of a permanent underclass.

0 favorites 0 likes

#qwen

We got local models to triage the OpenClaw repo for FREE!*

Hugging Face Blog ↗ · 3d ago Cached

The blog post describes using local open-weight models like Gemma and Qwen in an agent harness to automatically triage issues and pull requests in the OpenClaw repository, enabling real-time notifications without relying on costly closed API models.

0 favorites 0 likes

#qwen

Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

Hacker News Top ↗ · 3d ago Cached

A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.

0 favorites 0 likes

#qwen

@losterror501: with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...

X AI KOLs Timeline ↗ · 3d ago Cached

Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).

0 favorites 0 likes

qwen

Submit Feedback