Tag
This paper frames model merging as probabilistic inference under a product-of-experts scenario, showing that existing methods are special cases and proposing a heavy-tailed Cauchy expert design that better captures real residual behavior, achieving significant improvements over state-of-the-art baselines.
This paper investigates the 'sparsity curse' in merging RLVR (Reinforcement Learning with Verifiable Reward) models, finding that sparse updates cause near-orthogonal parameter directions that hinder aggregation, and proposes SAR-Merging, which uses Fisher information and sparsification to resolve conflicts and improve merging performance on math and coding tasks.
This paper characterizes the unique parameter space dynamics of on-policy distillation (OPD) for large language models, showing that it exhibits relaxed off-principal updates and subspace locking, distinguishing it from supervised fine-tuning and reinforcement learning with verifiable rewards.
This paper introduces CapVector, a method that decouples auxiliary training objectives from standard supervised finetuning in Vision-Language-Action models. By extracting transferable capability vectors and applying orthogonal regularization, it enhances model performance and generalization while significantly reducing computational overhead.