Tag
This paper investigates how lexical overlap, rather than semantic content, influences LLM representations across layers and architectures, and demonstrates that this lexical effect persists even in models trained for semantic similarity, leading to degraded performance on downstream tasks.
This paper introduces bounded behavioral indistinguishability, a formal framework for evaluating black-box LLM distillation beyond semantic similarity. Experiments on Qwen and Llama models show that distillation reduces but does not eliminate adversarial distinguishability, highlighting the need for category-aware evaluation.
OmniOPD introduces a logit-free on-policy distillation method that uses chunk-level semantic similarity and speculative verification to train student models with black-box teachers, achieving up to +28.64% improvement on math benchmarks over standard OPD.
Researchers from PNNL and Washington University introduce a systematic framework to test how five LLMs detect subtle semantic changes in documents, revealing positional bias, context coherence effects, and model-specific scoring fingerprints.