Tag
The paper introduces the context-ready transformer, a recurrent architecture that pre-contextualizes tokens before the transformer block, achieving significant inference speedups (e.g., 1.7x on A100) while matching or exceeding standard transformer performance with fewer layers.