rtpurbo

#rtpurbo

@rohanpaul_ai: New Alibaba + Nanjing Univ paper claims million-token prefill can be sped up 9.36X (compared against FlashAttention-2) …

X AI KOLs Timeline ↗ · 2026-05-24 Cached

A new paper from Alibaba and Nanjing University introduces RTPurbo, a method that speeds up million-token prefill by up to 9.36x compared to FlashAttention-2 by selectively applying full attention only where needed, without retraining the model.

0 favorites 0 likes

#rtpurbo

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

arXiv cs.CL ↗ · 2026-05-19 Cached

RTPurbo converts full-attention LLMs into sparse models with only a few hundred training steps, achieving near-lossless accuracy and up to 9.36x prefill and 2.01x decode speedups.

0 favorites 0 likes

rtpurbo

@rohanpaul_ai: New Alibaba + Nanjing Univ paper claims million-token prefill can be sped up 9.36X (compared against FlashAttention-2) …

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Submit Feedback