Tag
EarlyTom is a training-free framework that compresses visual tokens early in the vision encoder to reduce time-to-first-token and computational costs while maintaining accuracy, achieving up to 2.65x TTFT reduction.