encoder-models

#encoder-models

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper uses mechanistic interpretability to explain why authorship attribution models fine-tuned with the same encoder, data, and loss can differ four-fold in performance depending on the scoring mechanism. It finds that the scorer determines where the encoder consolidates authorship signal, with mean pooling forcing early consolidation and late interaction allowing late consolidation.

0 favorites 0 likes

encoder-models

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Submit Feedback