Tag
This paper introduces the Structured Recurrent Mixer (SRM), an architecture enabling algebraic conversion between parallel training and recurrent inference without specialized kernels. Experiments show SRMs achieve significantly higher throughput and concurrency compared to Transformers, with effective performance in reinforcement learning tasks.