Tag
AdaCodec reduces video encoding redundancy in multimodal LLMs by transmitting full visual tokens only when scene prediction fails, otherwise using compact inter-frame change descriptions. It outperforms per-frame RGB baselines at matched token budgets and achieves better or comparable results with significantly fewer tokens, reducing time-to-first-token from 9.26s to 1.62s.
Proposes R2R2, a regularization method for self-predictive learning in reinforcement learning to mitigate overfitting under high update-to-data ratios, achieving significant improvements on continuous control tasks.
R²-dLLM introduces spatio-temporal redundancy reduction techniques that cut diffusion LLM decoding steps by up to 75% while preserving generation quality, addressing a key deployment bottleneck.