@no_stp_on_snek: In progress
Summary
Promoting Atlas Inference, an open-source inference serving tool that achieved 200+ tok/s on a Qwen3.6-35B-A3B benchmark.
View Cached Full Text
Cached at: 05/24/26, 08:18 AM
In progress https://t.co/DFkWLU43lH
Azeez (@AtlasInference): Try Atlas Inference. You’ll be ready to serve in <2 mins. https://t.co/vxZLwBJMub ⚡️
Works with sparkrun out the box, happy to share Docker commands as well but all are on the website.
Open source too, most recently achieved 200+ tok/s on a Qwen3.6-35B-A3B benchmark!
Similar Articles
@no_stp_on_snek: https://x.com/no_stp_on_snek/status/2052833502475833384
An open-source stack using Qwen2.5-32B-Instruct with longctx and vllm-turboquant on a single AMD MI300X achieves competitive results (0.601-0.688) versus SubQ's closed model (0.659) on the MRCR v2 1M-context benchmark, demonstrating open-weights approaches are within striking distance.
@bastani_behnam: We just published how we unlocked +50% inference capacity on a 27B model — no new GPUs, no new nodes, at a fraction of …
OpenInfer demonstrates "vertical disaggregation" that boosts Qwen 3.5 27B throughput by ~50% by co-executing quantized layers across a single node’s AMD EPYC CPU and Nvidia L40S GPU with a custom SLA-aware scheduler.
@no_stp_on_snek: btw this was my loop. as you can see i didn't put much thought into it (typos and all), just a side thing to assess the…
Release of Qwopus3.6-27B-v2-MTP, a fine-tuned multi-token prediction reasoning model based on Qwen3.6-27B, optimized for coding, DevOps, and math tasks with improved generation speed.
@no_stp_on_snek: http://LocalMaxxing.com First of many submissions.
LocalMaxxing is a website providing community benchmarks for local LLM inference, allowing users to track speed and compare hardware.
@tenstorrent: Thank you Tokyo! Here’s everything we announced at TT-Deploy Japan: Faster AI Inference • Kimi K2.6 900 t/s/u, 3x faste…
Tenstorrent announced at TT-Deploy Japan faster AI inference for Kimi K2.6, LTX 2.3, and DeepSeek-R1 on their hardware, plus the licensable TT-Ascalon S RISC-V CPU for agentic AI.