Tag
This paper proposes a local perturbation theory to explain cross-domain interference in multi-domain RL for LLMs, showing that interference is driven by a second-order damage term in a low-dimensional conflict subspace, and demonstrates that brief domain refresh or training-free rollback can selectively recover lost capabilities.
MERIT introduces conflict-aware splitting and weight merging for decentralized instruction tuning, achieving improved performance without gradient synchronization across partitions.
The paper identifies off-manifold drift in guided flow models under compositional rewards and proposes Conflict-Aware Additive Guidance (CAR), a lightweight method that dynamically resolves gradient conflicts to improve generation fidelity without retraining.
Introduces DualOptim+, an optimization framework for LLM unlearning that uses shared base states and decoupled delta states to balance forgetting and retaining objectives, with a quantized variant for reduced memory.