Tag
This paper proposes Native Factorized Weights for transformers, where every linear layer is trained as a product of two low-rank matrices from initialization. Experiments show a corpus-determined optimal rank that minimizes validation loss and a generalization band, outperforming dense baselines with fewer parameters.
Introduces Duplicated Latent Residual (DLR), a training-only, parameter-free plug-in for low-rank pre-training that improves perplexity across LLaMA models from 60M to 7B parameters, and can be folded into the model after training with zero inference cost.
This paper introduces a distributional generalization of matrix completion where each entry is a probability distribution rather than a scalar, using kernel mean embeddings and Tucker rank to capture low-rank structure. The authors propose a novel estimator with non-asymptotic error bounds and demonstrate effectiveness on synthetic and real-world data.
Introduces Eggroll, a low-rank evolution strategy for gradient-free training of spiking neural networks, reducing memory and time overhead while achieving competitive accuracy on N-MNIST.
VideoMLA replaces per-head KV caches in video diffusion models with a shared low-rank latent and decoupled 3D-RoPE positional keys, reducing per-token KV memory by 92.7% and improving throughput by 1.23x on a B200 while maintaining quality on VBench benchmarks.
LoRDBA replaces LoRA's floating-point low-rank factors with binary sign carriers and channel-wise scales, enabling efficient on-device fine-tuning with significant footprint reduction and minimal latency overhead, matching fp16 quality.
Proposes M-ORE, a modality-decoupled online recursive editor for lifelong adaptation of multimodal large language models, addressing cross-modal conflict and inter-edit interference with constant per-edit overhead.
This paper studies piecewise-stationary low-rank linear contextual bandits, proposes the SPSC algorithm that achieves dynamic regret scaling with the intrinsic rank instead of the ambient dimension, and characterizes the identification boundary for subspace recovery under scalar feedback.
This paper identifies a geometric mismatch in the Dion low-rank spectral optimizer and proposes Orth-Dion, which replaces column normalization with QR orthogonalization to close the convergence gap to full-rank methods like Muon at the same communication cost, validated on large-scale language model pre-training.
Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.
Asymmetric Flow Modeling (AsymFlow) restricts noise prediction to low-rank subspaces for efficient high-dimensional flow-based generation, achieving state-of-the-art results on ImageNet and text-to-image tasks by fine-tuning from latent flow models.