Tag
A user reports that MTP versions of Qwen 3.6 and Gemma 4 models produce lower quality outputs in code review tasks compared to non-MTP counterparts, with only marginal real-world speed improvements despite higher token generation rates.
Anthropic alleged that Alibaba's Qwen lab used nearly 25,000 fake accounts to run 29 million Claude model exchanges, surpassing all prior distillation campaigns, prompting U.S. senators to consider legislation sanctioning Chinese firms for unauthorized access to AI model outputs.
Fara1.5 is a family of native computer use agents trained using the FaraGen1.5 scalable data pipeline. The models achieve new state-of-the-art results on browser-use benchmarks, competing with much larger frontier models.
A critical thread analyzing the Qwen-AgentWorld paper, which proposes language world models for general agents. The critique raises concerns about simulator fidelity, benchmark design, and cost, rating it 4.5/10 on a bullshitometer.
Qwen-AgentWorld-35B-A3B is a new model variant from the Qwen series, specialized for coding tasks.
Qwen released a new large language model, Qwen-AgentWorld-397B-A17B, as detailed on HuggingFace and the Qwen blog.
This paper examines the reliability of exact-match retrieval recall as a proxy for downstream policy classification performance in long-horizon tool-use agents. Experiments with Qwen2.5 classifiers on τ-bench show that low clause recall does not significantly degrade classifier accuracy, suggesting that retrieval metrics alone can mislead when evaluating policy signal.
Qwen-AgentWorld releases an open 35B total / 3B active MoE world model with 256K context, along with a 7-domain benchmark, achieving state-of-the-art performance on AgentWorldBench.
New GGUF quantizations of Qwen3.6-27B optimized for 16GB VRAM NVIDIA GPUs, including an experimental Trellis variant, with perplexity benchmarks.
The author maps the Kullback-Leibler divergence of KV cache quantization for the Qwen3.6-35B-A3B and Gemma4-E2B QAT models.
A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.
An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.
Using TurboQuant, the user achieved 20 tokens per second on a Qwen 3.6 35B MoE model running on a GTX1060 3GB, showcasing impressive performance on outdated hardware.
A fine-tuned version of Qwen3.5-MoE called NEX-N2-mini reportedly fixes overthinking issues seen in Qwen 3.5 and 3.6 models.
A detailed guide on running the Qwen3.6-35B-A3B APEX model on an RTX 3090, comparing two llama.cpp forks and quantization methods for optimal speed and quality.
The author calculates the token cost and break-even period of running large models on a Mac Studio, concluding that it is not cost-effective for ordinary users to buy a Mac for personal large model use, and suggests that using APIs or renting GPUs is more economical.
A researcher suggests it's time to buy more GPUs and build a local AI stack, referencing Qwen 3.5 27B and GLM 5.2 as models that cancel the threat of a permanent underclass.
The blog post describes using local open-weight models like Gemma and Qwen in an agent harness to automatically triage issues and pull requests in the OpenClaw repository, enabling real-time notifications without relying on costly closed API models.
A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.
Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).