Tag
This paper investigates the 'sparsity curse' in merging RLVR (Reinforcement Learning with Verifiable Reward) models, finding that sparse updates cause near-orthogonal parameter directions that hinder aggregation, and proposes SAR-Merging, which uses Fisher information and sparsification to resolve conflicts and improve merging performance on math and coding tasks.
This paper characterizes the unique parameter space dynamics of on-policy distillation (OPD) for large language models, showing that it exhibits relaxed off-principal updates and subspace locking, distinguishing it from supervised fine-tuning and reinforcement learning with verifiable rewards.
This paper introduces CapVector, a method that decouples auxiliary training objectives from standard supervised finetuning in Vision-Language-Action models. By extracting transferable capability vectors and applying orthogonal regularization, it enhances model performance and generalization while significantly reducing computational overhead.