representational-drift

Tag

Cards List
#representational-drift

A Gravitational Interpretation of Fine-Tuning Reversion

arXiv cs.LG · 2d ago Cached

The paper proposes a gravitational interpretation for fine-tuning reversion, where early training creates dominant behavioral manifolds that later alignment only shallowly displaces, causing a persistent reversion direction. Experiments show that blocking this direction reduces harmfulness with minimal task cost.

0 favorites 0 likes
#representational-drift

Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning

arXiv cs.LG · 2026-05-19 Cached

This paper introduces a diagnostic framework using Sparse Autoencoders to analyze concept-level forgetting in continual learning, finding that much forgetting is due to representational inaccessibility rather than erasure.

0 favorites 0 likes
← Back to home

Submit Feedback