weight-space

Tag

Cards List
#weight-space

Weight-Space Geometry of Offline Reasoning Training

arXiv cs.LG · 22h ago Cached

This paper investigates whether different offline reinforcement learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) for reasoning distillation produce mechanistically distinct weight updates in a small language model. Using identical math rollouts and a controlled setup with Qwen3-4B and attention-only LoRA, they find that SFT, RFT, and RIFT yield nearly colinear weight deltas, while DPO sits in a near-orthogonal subspace and achieves the highest accuracy.

0 favorites 0 likes
#weight-space

Robotic Policy Adaptation via Weight-Space Meta-Learning

Hugging Face Daily Papers · 2026-06-05 Cached

Introduces WIZARD, a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies from language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning.

0 favorites 0 likes
#weight-space

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Hugging Face Daily Papers · 2026-05-28 Cached

This paper introduces the concept of Access Sets to budget expert reads, enabling scalable weight-space model merging.

0 favorites 0 likes
← Back to home

Submit Feedback