Tag
This paper presents a novel speaker verification framework that combines frozen self-supervised features with ECAPA-TDNN and a Mixture of Experts module, using conditional distillation and contrastive loss to improve identity verification across both speech and non-verbal vocalizations while preventing catastrophic forgetting.
This paper proposes a post-training refinement approach using interventional contrastive learning to disentangle speech foundation model representations into separate content and speaker subspaces. The method shows improved out-of-domain speaker verification performance and evidence of successful separation.