weight-space

#weight-space

Weight-Space Geometry of Offline Reasoning Training

arXiv cs.LG ↗ · yesterday Cached

This paper investigates whether different offline reinforcement learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) for reasoning distillation produce mechanistically distinct weight updates in a small language model. Using identical math rollouts and a controlled setup with Qwen3-4B and attention-only LoRA, they find that SFT, RFT, and RIFT yield nearly colinear weight deltas, while DPO sits in a near-orthogonal subspace and achieves the highest accuracy.

0 favorites 0 likes

#weight-space

Robotic Policy Adaptation via Weight-Space Meta-Learning

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

Introduces WIZARD, a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies from language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning.

0 favorites 0 likes

#weight-space

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper introduces the concept of Access Sets to budget expert reads, enabling scalable weight-space model merging.

0 favorites 0 likes

weight-space

Weight-Space Geometry of Offline Reasoning Training

Robotic Policy Adaptation via Weight-Space Meta-Learning

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Submit Feedback