encoder-models

Tag

Cards List
#encoder-models

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

arXiv cs.CL · 2026-05-20 Cached

This paper uses mechanistic interpretability to explain why authorship attribution models fine-tuned with the same encoder, data, and loss can differ four-fold in performance depending on the scoring mechanism. It finds that the scorer determines where the encoder consolidates authorship signal, with mean pooling forcing early consolidation and late interaction allowing late consolidation.

0 favorites 0 likes
← Back to home

Submit Feedback