@xenovacom: Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! Agentic…

X AI KOLs Following 04/23/26, 01:15 PM Tools

Summary

Opus 4.7 auto-generated a custom WebGPU kernel that accelerates Qwen3.5 inference up to 13× via fused LinearAttention, now shipping in Transformers.js v4.2.0.

Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! Agentic kernel optimization is the future. Now live in Transformers.js v4.2.0! P.S. I've updated all our previous demos to use this new version. Enjoy!

Original Article

View Cached Full Text

Cached at: 04/23/26, 02:07 PM

Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! Agentic kernel optimization is the future. Now live in Transformers.js v4.2.0! P.S. I’ve updated all our previous demos to use this new version. Enjoy!

Similar Articles

@ngxson: Qwen3.6-27B running 100% on WebGPU. Not the best speed but still

X AI KOLs Following

A developer demonstrates running the Qwen3.6-27B AI model entirely on WebGPU in a browser, though speed is not optimal.

@cniongolo: I’m not sure people realize yet that you can actually run Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF on a dua…

X AI KOLs Following

Demonstrates running a custom Qwen model (Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF) on dual Nvidia RTX PRO 6000 Blackwell GPUs at 195 tokens per second using Hugging Face Inference.

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

X AI KOLs Following

Qwen3.6-27B-FP8 model is now running on Intel Arc Pro B70 GPUs at ~50 tok/s with a vLLM bug fix, marking a significant milestone for Intel GPU local AI inference.

@SpaceTimeViking: Qwen3.6 27B getting some love on the new AEON ULTIMATE VLLM image @NVIDIAAI DGX SPARK OPTIMIZED! https://github.com/AEO…

X AI KOLs Timeline

AEON-7 releases a fully uncensored, capability-enhanced abliteration of Qwen3.6-27B, optimized for NVIDIA DGX Spark with NVFP4 quantization and DFlash speculative decoding for improved performance.

@sudoingX: update: qwen 3.6 27b dense q4 just one shotted octopus invaders game on a single 3090. hermes agent drove the whole thi…

X AI KOLs Timeline

A user benchmark demonstrates that the Qwen 3.6 27B dense model (Q4 quantized) can autonomously generate a fully playable multi-file game in a single prompt on a single RTX 3090, significantly outperforming its predecessor with zero manual interventions. The results highlight major improvements in local code generation and agentic capabilities for consumer-grade hardware.

Similar Articles

@ngxson: Qwen3.6-27B running 100% on WebGPU. Not the best speed but still

@cniongolo: I’m not sure people realize yet that you can actually run Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF on a dua…

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

@SpaceTimeViking: Qwen3.6 27B getting some love on the new AEON ULTIMATE VLLM image @NVIDIAAI DGX SPARK OPTIMIZED! https://github.com/AEO…

@sudoingX: update: qwen 3.6 27b dense q4 just one shotted octopus invaders game on a single 3090. hermes agent drove the whole thi…

Submit Feedback