Tag
PFlash now supports running @poolsideai's Laguna-XS.2 (33B-A3B MoE) on a single RTX 3090, achieving 111 tok/s decode and 5.4x faster prefill than llama.cpp, with NIAH passes up to 131K context.