Tag
This paper presents a systematic review and benchmark of 24 black-box uncertainty estimation methods for large language models across 4 models and 4 dataset settings, finding that no single method dominates but hybrid methods that combine multiple uncertainty signals perform well.
This paper proposes ORCA, a method for black-box online adaptation of time series foundation models by learning the context of predictive errors. It demonstrates effectiveness across five TSFMs and eight datasets, addressing the challenge of adapting closed-source API-based models.
This paper introduces Vector Linking, a method for recovering correspondences between embeddings from different black-box encoders by leveraging local geometric consistency, proposing an iterative reference-based geometric embedding hashing approach using a small seed set of paired anchors.
This paper introduces bounded behavioral indistinguishability, a formal framework for evaluating black-box LLM distillation beyond semantic similarity. Experiments on Qwen and Llama models show that distillation reduces but does not eliminate adversarial distinguishability, highlighting the need for category-aware evaluation.
The article highlights three common failure modes in production AI memory systems: outdated preferences persisting, sarcasm stored as literal, and summaries outliving their source facts. It argues that the AI memory industry lacks provenance, confidence scores, and versioning, creating a black-box problem that hinders debugging.
This paper proposed Distribution-Aligned Adversarial Distillation (DisAAD), a method that uses a lightweight proxy model to estimate uncertainty in black-box LLMs with only 1% of the original model size, achieving reliable quantification without requiring internal parameters or multiple sampling.
Researchers propose a surrogate modeling framework to quantify and interpret latent medical knowledge encoded in black-box LLMs, revealing both valid associations and persistent racial biases.
Researchers introduce SHADE, a hybrid estimator that combines Good-Turing coverage with graph-spectral cues to quantify semantic uncertainty and detect LLM hallucinations when only a few black-box samples are available.
MedFocusLeak introduces the first transferable black-box adversarial attack on medical vision-language models, using imperceptible background perturbations to mislead clinical diagnoses across six imaging modalities.