Tag
This paper investigates whether direct activation transfer between language models can improve reasoning, using a linear translation layer from Pythia-160M to Pythia-410M. Despite achieving high representational alignment, the transferred activations do not improve multi-hop question answering, yielding a negative result.