Tag
A new paper from Alibaba and Nanjing University introduces RTPurbo, a method that speeds up million-token prefill by up to 9.36x compared to FlashAttention-2 by selectively applying full attention only where needed, without retraining the model.