behavioral-transfer

#behavioral-transfer

Quantifying Subliminal Behavioral Transfer Ratios in Language Model Distillation

arXiv cs.LG ↗ · 2026-06-11 Cached

This paper quantifies the magnitude of subliminal behavioral transfer in language model distillation, showing that undesirable traits can transfer robustly from teacher to student models even with benign training data, and that transfer scales differently across model families.

0 favorites 0 likes

behavioral-transfer

Quantifying Subliminal Behavioral Transfer Ratios in Language Model Distillation

Submit Feedback