@losterror501: with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...

X AI KOLs Timeline 06/21/26, 08:47 PM Models

model-release open-weights distillation performance qwen claude

Summary

Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).

with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane...

Original Article

View Cached Full Text

Cached at: 06/22/26, 01:41 AM

with 2dgx sparks getting 25tok/sec with 1 session and it peaks to 152tok/sec with 8 sessions. Actually insane…

Taha ז (@lordx64): Releasing Qwable-v1 - an open-weights Qwen3.6-35B-A3B distilled from Claude Fable-5, Anthropic’s Mythos-class preview model that was briefly public for ~4days (2026-06-9 → 2026-06-12) before being suspended globally under U.S. export-control directives.

Fable-5 was Anthropic’s

Similar Articles

@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 🫪 1 DGX Spark with 128 GB unified memo…

X AI KOLs Timeline

@onusoz demonstrates running 16 parallel instances of NVIDIA's quantized Gemma-4-26B-A4B-NVFP4 model on a single DGX Spark with 128GB unified memory, achieving 300 tok/s aggregate, showcasing high concurrency without flashinfer.

DGX Spark agentic usage numbers

Reddit r/LocalLLaMA

A user shares benchmark results and configuration for running Qwen3.6 models on NVIDIA DGX Spark using vLLM, focusing on agentic workloads with concurrent requests and tool calling.

@TeksEdge: Wow! New open source Computer Use model shows strong local performance on LLM Leaderboard using a single DGX Spark! Thi…

X AI KOLs Timeline

H Company released Holo-3.1-35B-A3B-NVFP4, an open-source computer-use model that achieves up to 195 tokens per second on a single DGX Spark node, outperforming larger models like Qwen3.5-397B and Kimi-K2.5.

@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …

X AI KOLs Timeline

Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.

Dual dgx spark (Asus GX10) MiniMax M2.7 results

Reddit r/LocalLLaMA

User benchmarks dual Asus GX10 (DGX Spark) running MiniMax-M2.7-AWQ-4bit, achieving 30–40 tokens/s while drawing only ~100 W each, replacing noisy multi-GPU rigs.

Similar Articles

@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 🫪 1 DGX Spark with 128 GB unified memo…

DGX Spark agentic usage numbers

@TeksEdge: Wow! New open source Computer Use model shows strong local performance on LLM Leaderboard using a single DGX Spark! Thi…

@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …

Dual dgx spark (Asus GX10) MiniMax M2.7 results

Submit Feedback