@alex_whedon: Hey, folks! We have been blown away by the response to SubQ and the SSA breakthrough over the last 48 hours. It is awes…

X AI KOLs Following 05/07/26, 04:03 PM Models

subq ssa breakthrough efficient-algorithms model-release third-party-validation community

Summary

The creator of SubQ announces an overwhelming response to the SSA breakthrough, with plans to release a model card with additional data and third-party validation next week.

Hey, folks! We have been blown away by the response to SubQ and the SSA breakthrough over the last 48 hours. It is awesome to see how many people are responding to our mission of creating more efficient algorithms to create better models. We are working hard to firm up our release timeline and will share more very soon. We will also share additional data and third-party validation in our model card next week. If you have questions, please post them in the thread, and I'll do my best to respond! Above all, THANK YOU! The support, feedback, and discussion from this community have been inspiring.

Original Article

Similar Articles

@seclink: Just hit 134 tok/s with Qwen 3.5-27B Dense and 73 tok/s with the new Qwen 3.6-27B on a single RTX 3090. The 2026 open-source scene is moving at lightspeed…

X AI KOLs Following

A single RTX 3090 pushes 134 tok/s on the fresh 27B Qwen 3.5 Dense and 73 tok/s on Qwen 3.6-27B via fused kernels plus speculative decoding, with GGUF drops the same evening.

Qwen 3.6 Max Preview just went live on the Qwen Chat website. It currently has the highest AA-Intelligence Index score among Chinese models (52) (Will it be open source?)

Reddit r/LocalLLaMA

Qwen 3.6 Max Preview launched on Qwen Chat website, achieving the highest AA-Intelligence Index score (52) among Chinese models, with uncertainty about whether it will be open source.

@outsource_: NEW GLM+ QWEN 18B RUNS ON CONSUMER GPU IT BEATS 35B MoE AT HALF THE VRAM @KyleHessling1 just dropped the healed Qwopus-…

X AI KOLs Timeline

A new 18B merged quantized model, Qwopus-GLM-18B-GGUF, outperforms 35B MoE models while using half the VRAM and running on consumer GPUs.

@bastani_behnam: We just published how we unlocked +50% inference capacity on a 27B model — no new GPUs, no new nodes, at a fraction of …

X AI KOLs Following

OpenInfer demonstrates "vertical disaggregation" that boosts Qwen 3.5 27B throughput by ~50% by co-executing quantized layers across a single node’s AMD EPYC CPU and Nvidia L40S GPU with a custom SLA-aware scheduler.

Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post