@no_stp_on_snek: In progress

X AI KOLs Following 05/23/26, 06:35 PM Tools

Summary

Promoting Atlas Inference, an open-source inference serving tool that achieved 200+ tok/s on a Qwen3.6-35B-A3B benchmark.

In progress https://t.co/DFkWLU43lH

Original Article

View Cached Full Text

Cached at: 05/24/26, 08:18 AM

In progress https://t.co/DFkWLU43lH

Azeez (@AtlasInference): Try Atlas Inference. You’ll be ready to serve in <2 mins. https://t.co/vxZLwBJMub ⚡️

Works with sparkrun out the box, happy to share Docker commands as well but all are on the website.

Open source too, most recently achieved 200+ tok/s on a Qwen3.6-35B-A3B benchmark!

Similar Articles

@no_stp_on_snek: https://x.com/no_stp_on_snek/status/2052833502475833384

X AI KOLs Following

An open-source stack using Qwen2.5-32B-Instruct with longctx and vllm-turboquant on a single AMD MI300X achieves competitive results (0.601-0.688) versus SubQ's closed model (0.659) on the MRCR v2 1M-context benchmark, demonstrating open-weights approaches are within striking distance.

@bastani_behnam: We just published how we unlocked +50% inference capacity on a 27B model — no new GPUs, no new nodes, at a fraction of …

X AI KOLs Following

OpenInfer demonstrates "vertical disaggregation" that boosts Qwen 3.5 27B throughput by ~50% by co-executing quantized layers across a single node’s AMD EPYC CPU and Nvidia L40S GPU with a custom SLA-aware scheduler.

@no_stp_on_snek: btw this was my loop. as you can see i didn't put much thought into it (typos and all), just a side thing to assess the…

X AI KOLs Following

Release of Qwopus3.6-27B-v2-MTP, a fine-tuned multi-token prediction reasoning model based on Qwen3.6-27B, optimized for coding, DevOps, and math tasks with improved generation speed.

@no_stp_on_snek: http://LocalMaxxing.com First of many submissions.

X AI KOLs Following

LocalMaxxing is a website providing community benchmarks for local LLM inference, allowing users to track speed and compare hardware.

@tenstorrent: Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan: Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faste…

X AI KOLs Timeline

Tenstorrent announced at TT-Deploy Japan faster AI inference for Kimi K2.6, LTX 2.3, and DeepSeek-R1 on their hardware, plus the licensable TT-Ascalon S RISC-V CPU for agentic AI.

Similar Articles

@no_stp_on_snek: https://x.com/no_stp_on_snek/status/2052833502475833384

@bastani_behnam: We just published how we unlocked +50% inference capacity on a 27B model — no new GPUs, no new nodes, at a fraction of …

@no_stp_on_snek: btw this was my loop. as you can see i didn't put much thought into it (typos and all), just a side thing to assess the…

@no_stp_on_snek: http://LocalMaxxing.com First of many submissions.

@tenstorrent: Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan: Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faste…

Submit Feedback