pflash

#pflash

@pupposandro: PFlash now run @poolsideai's Laguna-XS.2 (33B-A3B MoE) on a single RTX 3090. - 111 tok/s decode @ short ctx - 128K TTFT…

X AI KOLs Following ↗ · 2026-05-14 Cached

PFlash now supports running @poolsideai's Laguna-XS.2 (33B-A3B MoE) on a single RTX 3090, achieving 111 tok/s decode and 5.4x faster prefill than llama.cpp, with NIAH passes up to 131K context.

0 favorites 0 likes

pflash

@pupposandro: PFlash now run @poolsideai's Laguna-XS.2 (33B-A3B MoE) on a single RTX 3090. - 111 tok/s decode @ short ctx - 128K TTFT…

Submit Feedback