Tag
This research paper introduces adaptive correction scheduling for enforcing hard constraints in generative sampling, demonstrating that it improves the cost-accuracy frontier compared to terminal or stepwise projection methods.
This paper introduces Toeplitz MLP Mixers (TMM), a novel architecture that replaces attention with Toeplitz matrix multiplication to achieve lower computational complexity while maintaining high information retention and training efficiency.
This paper proposes Saliency-Aware Regularized Quantization Calibration (SARQC), a unified framework that improves Post-Training Quantization (PTQ) for LLMs by adding a regularization term to preserve weight proximity, enhancing generalization and performance.
This paper introduces CaRE, a novel continual learning framework using a bi-level routing mixture-of-experts mechanism to effectively handle class-incremental learning over sequences of 300+ tasks.
This paper proposes AnisoAlign, a framework that addresses the modality gap in multimodal models by applying anisotropic geometric correction to enable effective unpaired modality alignment.
Researchers from MIT CSAIL and other institutions introduced CompreSSM, a technique that compresses state-space AI models during training by removing unnecessary components early, resulting in faster training and smaller models without sacrificing performance.