Tag
A blog post reflecting on the process of becoming a machine learning researcher, drawing parallels to Zen meditation and emphasizing the importance of reading, building, focusing on fundamentals, and not chasing benchmarks.
This research paper introduces adaptive correction scheduling for enforcing hard constraints in generative sampling, demonstrating that it improves the cost-accuracy frontier compared to terminal or stepwise projection methods.
This paper introduces Toeplitz MLP Mixers (TMM), a novel architecture that replaces attention with Toeplitz matrix multiplication to achieve lower computational complexity while maintaining high information retention and training efficiency.
This paper proposes Saliency-Aware Regularized Quantization Calibration (SARQC), a unified framework that improves Post-Training Quantization (PTQ) for LLMs by adding a regularization term to preserve weight proximity, enhancing generalization and performance.
This paper introduces CaRE, a novel continual learning framework using a bi-level routing mixture-of-experts mechanism to effectively handle class-incremental learning over sequences of 300+ tasks.
This paper proposes AnisoAlign, a framework that addresses the modality gap in multimodal models by applying anisotropic geometric correction to enable effective unpaired modality alignment.
Researchers from MIT CSAIL and other institutions introduced CompreSSM, a technique that compresses state-space AI models during training by removing unnecessary components early, resulting in faster training and smaller models without sacrificing performance.