low-rank

#low-rank

Training transformers where every layer W = V·Uᵀ from initialization reveals a corpus-determined optimal rank - looking for arXiv endorser (cs.LG) [D]

Reddit r/MachineLearning ↗ · yesterday

This paper proposes Native Factorized Weights for transformers, where every linear layer is trained as a product of two low-rank matrices from initialization. Experiments show a corpus-determined optimal rank that minimizes validation loss and a generalization band, outperforming dense baselines with fewer parameters.

0 favorites 0 likes

#low-rank

DLR: Zero-Inference-Cost Latent Residuals for Low-Rank Pre-Training

arXiv cs.LG ↗ · 5d ago Cached

Introduces Duplicated Latent Residual (DLR), a training-only, parameter-free plug-in for low-rank pre-training that improves perplexity across LLaMA models from 60M to 7B parameters, and can be folded into the model after training with zero inference cost.

0 favorites 0 likes

#low-rank

Low-rank Distributional Matrix Completion

arXiv cs.LG ↗ · 2026-06-04 Cached

This paper introduces a distributional generalization of matrix completion where each entry is a probability distribution rather than a scalar, using kernel mean embeddings and Tucker rank to capture low-rank structure. The authors propose a novel estimator with non-asymptotic error bounds and demonstrate effectiveness on synthetic and real-world data.

0 favorites 0 likes

#low-rank

Gradient-Free Training of Spiking Neural Networks via Low-Rank Evolution Strategies

arXiv cs.AI ↗ · 2026-06-01 Cached

Introduces Eggroll, a low-rank evolution strategy for gradient-free training of spiking neural networks, reducing memory and time overhead while achieving competitive accuracy on N-MNIST.

0 favorites 0 likes

#low-rank

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

VideoMLA replaces per-head KV caches in video diffusion models with a shared low-rank latent and decoupled 3D-RoPE positional keys, reducing per-token KV memory by 92.7% and improving throughput by 1.23x on a B200 while maintaining quality on VBench benchmarks.

0 favorites 0 likes

#low-rank

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

arXiv cs.LG ↗ · 2026-05-26 Cached

LoRDBA replaces LoRA's floating-point low-rank factors with binary sign carriers and channel-wise scales, enabling efficient on-device fine-tuning with significant footprint reduction and minimal latency overhead, matching fp16 quality.

0 favorites 0 likes

#low-rank

Modality-Decoupled Online Recursive Editing

arXiv cs.LG ↗ · 2026-05-21 Cached

Proposes M-ORE, a modality-decoupled online recursive editor for lifelong adaptation of multimodal large language models, addressing cross-modal conflict and inter-edit interference with constant per-edit overhead.

0 favorites 0 likes

#low-rank

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

arXiv cs.LG ↗ · 2026-05-21 Cached

This paper studies piecewise-stationary low-rank linear contextual bandits, proposes the SPSC algorithm that achieves dynamic regret scaling with the intrinsic rank instead of the ambient dimension, and characterizes the identification boundary for subspace recovery under scalar feedback.

0 favorites 0 likes

#low-rank

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

arXiv cs.LG ↗ · 2026-05-19 Cached

This paper identifies a geometric mismatch in the Dion low-rank spectral optimizer and proposes Orth-Dion, which replaces column normalization with QR orthogonalization to close the convergence gap to full-rank methods like Muon at the same communication cost, validated on large-scale language model pre-training.

0 favorites 0 likes

#low-rank

Δ-Mem: Efficient Online Memory for Large Language Models

Hacker News Top ↗ · 2026-05-16 Cached

Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.

0 favorites 0 likes

#low-rank

Asymmetric Flow Models

Hugging Face Daily Papers ↗ · 2026-05-13 Cached

Asymmetric Flow Modeling (AsymFlow) restricts noise prediction to low-rank subspaces for efficient high-dimensional flow-based generation, achieving state-of-the-art results on ImageNet and text-to-image tasks by fine-tuning from latent flow models.

0 favorites 0 likes

low-rank

Submit Feedback