qwen

#qwen

Qwen doesn't work for free

Reddit r/LocalLLaMA ↗ · 6h ago

The article discusses that Qwen, Alibaba's large language model, is not available for free usage, addressing pricing or access limitations for the model.

0 favorites 0 likes

#qwen

@davis7: @0xSero helped me setup local models properly and I uh, had no idea these things had gotten this good Are they frontier…

X AI KOLs Following ↗ · 10h ago

The author highlights the impressive capabilities of the open-source Qwen 3.6-27B model running locally on an RTX 5090, noting its strong performance on programming tasks and comparing it favorably to commercial models, despite the complexity of local deployment.

0 favorites 0 likes

#qwen

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA ↗ · 11h ago

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.

0 favorites 0 likes

#qwen

Qwen3.6 35B A3B uncensored heretic Native MTP Preserved is Out Now With KLD 0.0015, 10/100 Refusals and the Full 19 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

Reddit r/LocalLLaMA ↗ · 12h ago

Community release of Qwen3.6 35B A3B uncensored variant with full 19 MTP tensors preserved, available in multiple formats including Safetensors, GGUF, NVFP4 and GPTQ-Int4.

0 favorites 0 likes

#qwen

Qwen 35B-A3B is very usable with 12GB of VRAM

Reddit r/LocalLLaMA ↗ · 15h ago

A user benchmarks Qwen 35B-A3B (a 35B MoE model) on a 12GB RTX 3060, finding that 12GB VRAM is a practical sweet spot for running the model with 32k context, achieving ~47 t/s generation.

0 favorites 0 likes

#qwen

Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090

Reddit r/LocalLLaMA ↗ · 16h ago

Developer achieved 80+ t/s inference on Qwen3.6-27B with 262K context on a single RTX 4090 by combining MTP (Multi-Token Prediction) with TurboQuant's lossless KV cache compression, sharing their implementation fork and technical details.

1 favorites 1 likes

#qwen

@no_stp_on_snek: https://x.com/no_stp_on_snek/status/2052833502475833384

X AI KOLs Following ↗ · 17h ago Cached

An open-source stack using Qwen2.5-32B-Instruct with longctx and vllm-turboquant on a single AMD MI300X achieves competitive results (0.601-0.688) versus SubQ's closed model (0.659) on the MRCR v2 1M-context benchmark, demonstrating open-weights approaches are within striking distance.

0 favorites 0 likes

#qwen

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning

arXiv cs.CL ↗ · yesterday Cached

This paper presents a full-pipeline recipe for teaching thinking models to reason with tools, achieving state-of-the-art performance on benchmarks like AIME 2025 when applied to Qwen3 models.

0 favorites 0 likes

#qwen

Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-4bit

Reddit r/LocalLLaMA ↗ · yesterday

The user reviews a quantized and fine-tuned version of the Qwen3.6-35B model optimized for Apple Silicon via MLX, praising its speed, intelligence, and lack of safety disclaimers.

0 favorites 0 likes

#qwen

Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK

Reddit r/LocalLLaMA ↗ · yesterday

A benchmark analysis of Qwen 3.6 27B MTP on 4x RTX 3090 GPUs, demonstrating that using NVLink for tensor parallelism yields significant throughput improvements (up to +53%) over PCIe configurations.

0 favorites 0 likes

#qwen

Building Fast & Accurate Agents with Prime-RL Post Training (22 minute read)

TLDR AI ↗ · yesterday Cached

Ramp presents a case study on using reinforcement learning post-training to build Fast Ask, a specialized spreadsheet retrieval agent that improves accuracy and reduces latency compared to general-purpose models.

0 favorites 0 likes

#qwen

Openclaw as sys admin

Reddit r/openclaw ↗ · yesterday

The author describes using Openclaw as a system administrator on Linux servers, leveraging a local Qwen 3.6 27b model for security audits, updates, and deploying kiosk mode tasks without external internet access.

0 favorites 0 likes

#qwen

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

Hugging Face Daily Papers ↗ · 2d ago Cached

This research paper investigates how Large Language Models encode social role granularity as a structured latent dimension. It demonstrates that this 'Granularity Axis' is consistent across architectures like Qwen3 and Llama-3, and can be causally manipulated via activation steering.

0 favorites 0 likes

#qwen

Jackrong/Qwopus3.6-35B-A3B-v1-GGUF

Hugging Face Models Trending ↗ · 3d ago Cached

Jackrong releases Qwopus3.6-35B-A3B-v1, a reasoning-enhanced fine-tune of Alibaba's Qwen3.6 MoE model, optimized for logic and agentic coding with 35B total parameters and 3B active parameters.

0 favorites 0 likes

#qwen

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

Hugging Face Daily Papers ↗ · 3d ago Cached

This paper introduces Side-by-Side Interleaved Reasoning, a method for controlling disclosure timing in autoregressive models to improve accuracy and efficiency. It demonstrates improved performance on benchmarks using Qwen3 models by interleaving private reasoning with partial disclosures.

0 favorites 0 likes

#qwen

DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF

Hugging Face Models Trending ↗ · 2026-04-29 Cached

A community-finetuned, uncensored version of the Qwen 3.6 27B model featuring high-precision GGUF quantizations.

0 favorites 0 likes

#qwen

froggeric/Qwen-Fixed-Chat-Templates

Hugging Face Models Trending ↗ · 2026-04-23 Cached

This repository provides fixed Jinja chat templates for Qwen 3.5 and 3.6, addressing rendering errors, token waste, and missing features in the official templates for engines like LM Studio and llama.cpp.

0 favorites 0 likes

#qwen

POV Qwen 3.5 with thinking

Reddit r/LocalLLaMA ↗ · 2026-04-23

User observes Qwen 3.5 falling into repetitive thinking loops during generation.

0 favorites 0 likes

#qwen

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

X AI KOLs Following ↗ · 2026-04-23 Cached

A single RTX 3090 pushes 134 tok/s on the fresh 27B Qwen 3.5 Dense and 73 tok/s on Qwen 3.6-27B via fused kernels plus speculative decoding, with GGUF drops the same evening.

1 favorites 1 likes

#qwen

z-lab/Qwen3.6-27B-DFlash

Hugging Face Models Trending ↗ · 2026-04-23 Cached

This article introduces Qwen3.6-27B-DFlash, a specialized drafter model for DFlash, a novel speculative decoding method using block diffusion to accelerate inference speed. It provides installation instructions for vLLM and SGLang to enable parallel drafting with the target Qwen3.6-27B model.

0 favorites 0 likes

qwen

Submit Feedback