Articles from arXiv
This research paper introduces adaptive correction scheduling for enforcing hard constraints in generative sampling, demonstrating that it improves the cost-accuracy frontier compared to terminal or stepwise projection methods.
This paper proposes a sample-efficient framework using the cross-entropy method to estimate extreme reliability ('five-nines') in LLMs, addressing the limitations of standard benchmarks in detecting rare failures.
This paper argues that simple averaging in AI benchmarks fails under data sparsity and difficulty heterogeneity, proposing Item Response Theory (IRT) as a robust alternative to recover ground truth rankings.
This paper investigates the geometric structure of intermediate feature representations in deep neural networks by analyzing how various image manipulations map in feature space. It suggests that feature spaces are organized in linear structures to a first approximation, using generative image editing models to probe these representations.
This paper introduces Variational Linear Attention (VLA), a method that stabilizes memory states in linear attention mechanisms for long-context transformers. VLA reframes memory updates as an online regularized least-squares problem, proving bounded state norms and demonstrating significant speedups and improved retrieval accuracy over standard linear attention and DeltaNet.
This PhD thesis introduces deep learning methods for protein complex prediction and design, including GLINTER for contact prediction, ESMPair for homolog pairing, and RedNet for binder design.
This paper introduces CATS, a cascaded adaptive tree speculation framework designed to accelerate LLM inference on memory-constrained edge devices by optimizing memory usage while maintaining high token acceptance rates.
This paper challenges the geometric justification for the Muon optimizer, arguing that precise structure is less important than step-size optimality. It introduces Freon and Kaon optimizers to demonstrate that random or inverted spectra can perform as well as Muon.
This paper analyzes oversmoothing in Neural Sheaf Diffusion (NSD) as a representation degeneracy phenomenon using quiver theory and Geometric Invariant Theory. It proposes moment-map-inspired regularizers and explores non-uniform stalk dimensions to mitigate this issue in heterophilic graph benchmarks.
This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.
This paper introduces Asymmetric Langevin Unlearning (ALU), a framework that leverages public data to improve the privacy-utility trade-off in machine unlearning. It demonstrates that ALU reduces unlearning costs and enables mass unlearning while maintaining high model utility.
This paper introduces COSMOS, a model-agnostic personalized federated learning framework that uses clustered server models and pseudo-label-only communication. It provides theoretical analysis showing exponential personalization risk contraction and demonstrates superior performance over existing baselines in heterogeneous environments.
This position paper argues that interpretability research should be evaluated based on actionability—the extent to which insights enable concrete decisions and interventions. The authors propose a framework with evaluation criteria aligned with practical outcomes to address the lack of real-world impact in current interpretability work.
This paper introduces CORE, a new knowledge graph completion model that uses cyclic orthotope relation embeddings on a torus manifold to address boundary constraints in region-based models. Experiments show competitive performance in link prediction tasks.
This paper proposes Spectra, a method using spectral occupancy to analyze and control the realized capacity of latent graph models, arguing that rank is not equivalent to model capacity.
This paper analyzes spurious correlation learning in preference optimization methods like DPO, identifying mechanisms such as mean spurious bias and causal-spurious leakage. It proposes 'tie training' using equal-utility preference pairs as a mitigation strategy to reduce reliance on spurious features without degrading causal learning.
This paper introduces steerable neural ordinary differential equations on homogeneous spaces, providing a geometric framework for learning continuous-time equivariant dynamics.
This paper introduces HEPA, a self-supervised architecture for predicting rare critical events in time series using a Joint-Embedding Predictive Architecture (JEPA) pretraining strategy. It demonstrates superior performance across multiple domains with significantly fewer labeled data and tuned parameters compared to leading models.
This paper introduces S-FLM, a novel flow-based language model that operates in a hyperspherical latent space to address the computational costs and semantic limitations of existing discrete diffusion and continuous flow models.
This paper introduces GRAFT-ATHENA, a self-improving agentic framework that autonomously discovers and evolves numerical algorithms for scientific problems. It demonstrates near-machine-precision accuracy on physics-informed machine learning benchmarks and successfully tackles complex engineering challenges.