@losterror501: with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...
Summary
Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).
View Cached Full Text
Cached at: 06/22/26, 01:41 AM
with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane…
Taha ז (@lordx64): Releasing Qwable-v1 - an open-weights Qwen3.6-35B-A3B distilled from Claude Fable-5, Anthropic’s Mythos-class preview model that was briefly public for ~4days (2026-06-9 → 2026-06-12) before being suspended globally under U.S. export-control directives.
Fable-5 was Anthropic’s
Similar Articles
@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 1 DGX Spark with 128 GB unified memo…
@onusoz demonstrates running 16 parallel instances of NVIDIA's quantized Gemma-4-26B-A4B-NVFP4 model on a single DGX Spark with 128GB unified memory, achieving 300 tok/s aggregate, showcasing high concurrency without flashinfer.
DGX Spark agentic usage numbers
A user shares benchmark results and configuration for running Qwen3.6 models on NVIDIA DGX Spark using vLLM, focusing on agentic workloads with concurrent requests and tool calling.
@TeksEdge: Wow! New open source Computer Use model shows strong local performance on LLM Leaderboard using a single DGX Spark! Thi…
H Company released Holo-3.1-35B-A3B-NVFP4, an open-source computer-use model that achieves up to 195 tokens per second on a single DGX Spark node, outperforming larger models like Qwen3.5-397B and Kimi-K2.5.
@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …
Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.
Dual dgx spark (Asus GX10) MiniMax M2.7 results
User benchmarks dual Asus GX10 (DGX Spark) running MiniMax-M2.7-AWQ-4bit, achieving 30–40 tokens/s while drawing only ~100 W each, replacing noisy multi-GPU rigs.