qwen3

#qwen3

PEARL: Solver-in-the-Loop Interactive Optimization Modeling from Natural Language

arXiv cs.AI ↗ · yesterday Cached

Introduces PEARL, a system for interactive optimization modeling that uses a solver-in-the-loop approach to iteratively improve formulations from natural language, outperforming much larger models like DeepSeek-V3.2-685B.

0 favorites 0 likes

#qwen3

Committed Before Reasoning: Behavioral Reproduction and Preliminary Activation-Level Evidence of Answer Pre-Commitment in an Open-Weight LLM

arXiv cs.CL ↗ · 2d ago Cached

This paper reproduces the phenomenon of answer pre-commitment in an open-weight LLM (Qwen3-8B) using a minimal car-wash question and provides preliminary activation-level evidence that the commitment is encoded in hidden states before the answer text is emitted.

0 favorites 0 likes

#qwen3

Building a local AI server for Qwen3 30B with Q8 is this hardware a good fit?

Reddit r/AI_Agents ↗ · 2d ago

A discussion about building a local AI server for the Qwen3 30B model with Q8 quantization, questioning whether the chosen hardware is a good fit.

0 favorites 0 likes

#qwen3

@ADarmouni: https://arxiv.org/pdf/2607.13988 Good RL work from Microsoft research that manages to improve the Qwen3 small MoE bette…

X AI KOLs Timeline ↗ · 3d ago Cached

This paper introduces TRACE, a dense credit-assignment method for reinforcement learning of long-horizon agents that significantly improves Qwen3 small MoE models on agentic benchmarks without additional critic models.

0 favorites 0 likes

#qwen3

built a memory pipeline on Qwen3 235B A22B Instruct 2507 that scored #1 on LongMemEval-S (470/500) while being ~10x more token efficient than the next best system

Reddit r/LocalLLaMA ↗ · 2026-07-14

A memory pipeline built on Qwen3 235B A22B Instruct 2507 achieves the highest score on LongMemEval-S (470/500) while being approximately 10x more token-efficient than the next best system.

0 favorites 0 likes

#qwen3

Evaluating J-space entropy as an error predictor across 7 datasets on Qwen3-4B [R]

Reddit r/MachineLearning ↗ · 2026-07-13

This study evaluates whether J-space entropy (inspired by Anthropic's Jacobian Lens) can serve as an error predictor across seven datasets on Qwen3-4B. Results show it can complement output confidence for factual retrieval but is not a general hallucination detector, with strong task dependence.

0 favorites 0 likes

#qwen3

First attempts at a CPU setup - MS-02 Intel 285hx, trying Qwen3, Qwen3.6 and Gemma4

Reddit r/LocalLLaMA ↗ · 2026-07-12

Testing AI models Qwen3, Qwen3.6, and Gemma4 on a CPU setup using the Intel 285hx processor (MS-02).

0 favorites 0 likes

#qwen3

Un modello linguistico locale, privato 100%, sul tuo smartphone!!

Reddit r/ArtificialInteligence ↗ · 2026-07-05

Un modello linguistico locale e privato (Qwen 3 da 1.5B e 4B quantizzati) può girare offline su smartphone, con fine-tuning e LoRA distillato da un 32B.

0 favorites 0 likes

#qwen3

Travel-Oriented Reasoning Large Language Model via Domain-Specific Knowledge Graphs

arXiv cs.CL ↗ · 2026-06-30 Cached

This paper proposes a modular pipeline that uses a domain-specific knowledge graph to generate multi-hop QA pairs and fine-tune a reasoning LLM (Qwen3-4B) for the travel domain, achieving 82.4% exact match accuracy, significantly outperforming the baseline.

0 favorites 0 likes

#qwen3

Fine-Tuning General-Purpose Large Language Models for Agricultural Applications:A Reproducible Framework and Evaluation Protocol Based on Qwen3-8B

arXiv cs.CL ↗ · 2026-06-30 Cached

This paper proposes AgriTune-R, a reproducible framework for fine-tuning Qwen3-8B for agricultural tasks, integrating data governance, LoRA/QLoRA fine-tuning, RAG, expert evaluation, and safety control.

0 favorites 0 likes

#qwen3

Qwen3-tts.cpp + Compose Desktop GUI

Reddit r/LocalLLaMA ↗ · 2026-06-29

The developer improved qwen3-tts.cpp to run 5x realtime on RTX 5080 and created a cross-platform desktop GUI with Kotlin Compose Multiplatform, featuring voice cloning, streaming, and speaker embedding management.

0 favorites 0 likes

#qwen3

DeepSpec - a deepseek-ai Collection

Reddit r/LocalLLaMA ↗ · 2026-06-28 Cached

DeepSeek AI released the DeepSpec collection on Hugging Face, featuring speculative decoding models (dspark, dflash, eagle3) based on Qwen3 and Gemma4 in various sizes (1B-3B).

0 favorites 0 likes

#qwen3

Discovering Millions of Interpretable Features with Sparse Autoencoders

arXiv cs.LG ↗ · 2026-06-26 Cached

This paper introduces Qwen3-Instruct SAE, a suite of sparse autoencoders trained on Qwen3 instruction-tuned models, enabling the discovery of millions of interpretable features and demonstrating refusal steering capabilities.

0 favorites 0 likes

#qwen3

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Hugging Face Daily Papers ↗ · 2026-06-25 Cached

JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates, achieving up to 9.64x speedup on MATH-500 and 4.58x on conversational workloads.

0 favorites 0 likes

#qwen3

Fearless Concurrency on the GPU: Safe GPU inference in Rust, competitive with vLLM/SGLang [R]

Reddit r/MachineLearning ↗ · 2026-06-18

cuTile Rust introduces a tile-based programming model that leverages Rust's ownership to guarantee memory safety and data-race freedom for GPU kernels, and the Grout inference engine built on it achieves competitive throughput with vLLM/SGLang for Qwen3 models.

0 favorites 0 likes

#qwen3

@SpaceTimeViking: Qwen3.6 27B getting some love on the new AEON ULTIMATE VLLM image @NVIDIAAI DGX SPARK OPTIMIZED! https://github.com/AEO…

X AI KOLs Timeline ↗ · 2026-06-18 Cached

AEON-7 releases a fully uncensored, capability-enhanced abliteration of Qwen3.6-27B, optimized for NVIDIA DGX Spark with NVFP4 quantization and DFlash speculative decoding for improved performance.

0 favorites 0 likes

#qwen3

@lmsysorg: SGLang-Omni now serves MOSS-TTS-Local Transformer v1.5 from @Open_MOSS on day 0! This is an open 48 kHz stereo TTS mode…

X AI KOLs Timeline ↗ · 2026-06-18 Cached

MOSS-TTS-Local Transformer v1.5 is an open-source 48 kHz stereo TTS model with zero-shot voice cloning, native streaming, and support for 31 languages, built on a Qwen3-4B backbone and served via SGLang-Omni.

0 favorites 0 likes

#qwen3

@KaichaoYou: Scaling concurrent rollouts is one of the hardest parts of RL training infra. We had fun helping SemiAnalysis stress-te…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

KaichouYou discusses challenges in scaling concurrent rollouts for RL training infrastructure, highlighting a stress test of sandbox scaling on Qwen3 235B with SemiAnalysis, including a writeup of errors and fixes.

0 favorites 0 likes

#qwen3

@sheriyuo: This paper proposes ASAG, Attention-State Adaptive Generation, a training-free, plug-and-play stopping framework for re…

X AI KOLs Timeline ↗ · 2026-06-16 Cached

ASAG uses attention entropy to detect when reasoning is unproductive, stopping early to improve accuracy and reduce token generation. Experiments on Qwen3-8B show a 4.4% accuracy gain and over 40% fewer generated tokens.

0 favorites 0 likes

#qwen3

vLLM has a new streaming parser for Qwen3+ available in nightly

Reddit r/LocalLLaMA ↗ · 2026-06-15 Cached

vLLM now has a streaming parser for Qwen3+ models, available in the nightly build. vLLM is a fast and easy-to-use library for LLM inference and serving.

0 favorites 0 likes

qwen3

Submit Feedback