Tag
This paper introduces a protocol for fair comparison of diffusion-based OOD detectors and proposes Canonical Feature Snapshots (CFS), which leverage sparse internal activations for efficient detection.
Neural networks appear to speak English on the surface, but internally organize information in geometric space (curves, loops, surfaces, manifolds). Understanding "neural geometry" may be the key to understanding, debugging, and controlling models.
This paper proposes a conformal prediction framework for LLMs that leverages internal representations rather than output-level statistics, introducing Layer-Wise Information (LI) scores as nonconformity measures to improve validity-efficiency trade-offs under distribution shift. The method demonstrates stronger robustness to calibration-deployment mismatch compared to text-level baselines across QA benchmarks.
This paper introduces SIVR (Sequential Internal Variance Representation), a supervised framework for detecting hallucinations in LLMs by analyzing token-wise and layer-wise variance patterns in hidden states without relying on strict architectural assumptions. The method aggregates full sequence variance features to learn temporal patterns of factual errors and demonstrates improved generalization with smaller training sets.