Tag
This paper extends optimal transport-based hallucination detection to all decoder layers in NMT and abstractive summarization, finding that detection is concentrated in early layers and that the geometric signal transfers poorly to summarization due to faithfulness failures not detectable via attention concentration.
Presents MASF, a multi-model adaptive selection framework that integrates multiple fine-tuned transformer summarization models and selects the highest-quality summary, achieving 88.63% BERTScore on CNN/DailyMail and outperforming several LLMs.