Tag
Introduces a conformal prediction method for ordinal classification using the ranked probability score as a nonconformity function, producing median-centered contiguous prediction sets and achieving favorable balance between set width and ordinal miscoverage.
This paper proposes a method to certify the trust horizon of latent world models with known group symmetries by calibrating a raw error-propagation curve using split-conformal prediction and leveraging equivariance to transport certificates over the entire group orbit. The approach provides finite-sample guarantees and demonstrates non-vacuous certificates on symmetric 2D and 3D substrates.
Foresight is a failure detection framework for long-horizon robotic manipulation that uses action-conditioned world model latents and functional conformal prediction to monitor trajectories, trained only with final task labels. It demonstrates state-of-the-art performance across simulation and real robot tasks.
This paper audits the reliability of distribution-free risk control methods for selective classification in signal-domain detectors, finding that naive thresholding often exceeds its declared budget and that exchangeability violations cause certificate failures.
Proposes the first application of split conformal prediction to neural operator-based physics simulation, providing distribution-free prediction intervals with finite-sample coverage guarantees and adaptive-width intervals using MC Dropout uncertainty.
This paper presents LiverRisk, a machine learning framework for NAFLD risk prediction that combines gradient-boosted decision trees with conformal prediction to provide calibrated, distribution-free coverage guarantees on individual risk estimates, achieving high AUROC on internal and external cohorts.
This paper argues for a sequential inference framework to enhance LLM trustworthiness by modeling interactions as dependent stochastic processes, ensuring validity under repeated use, and enabling online monitoring for behavioral shifts.
UNIQ introduces a conformal calibration method for offline reinforcement learning that adapts conservatism per-state based on uncertainty, improving over IQL on some D4RL benchmarks while maintaining memory efficiency.
RRISE introduces a learned surrogate estimator that reduces the Monte Carlo sampling cost of randomized smoothing for certified robustness to a single forward pass, maintaining accuracy within 0.84 percentage points while replacing up to 10^4 evaluations per query.
The Ghost Annotator framework combines conformal prediction with collaborative filtering to model LLM behavior and human label variation in content moderation, revealing structural demographic biases in larger models.
EnergyMamba proposes a novel spatiotemporal framework combining a graph-enhanced selective state space model and adaptive conformalized quantile regression for accurate and reliable energy consumption prediction with uncertainty estimates, achieving improvements on real-world datasets from Florida, New York, and California.
COFT is a training-free decoding method that applies token-level fairness control and conformal calibration to reduce bias in chain-of-thought reasoning of large language models, achieving 30-55% bias reduction with minimal computational overhead.
Introduces Conf-Gen, a framework adapting conformal risk control to generative models, providing formal uncertainty guarantees for LLMs, image generators, and AI agents.
This paper introduces an empirical Bayes conformal prediction framework that uses r-values to incorporate score variability into nonconformity scores, improving ranking stability and reducing set size while preserving coverage for vision and language models.
This study evaluates five machine learning classifiers for chronic kidney disease risk prediction, finding that near-perfect internal performance fails under distribution shift. It emphasizes the need for calibration stability and conformal coverage transfer before clinical deployment.
Introduces Conformal Selective Acting (CSA), a deployment-time wrapper for RLVR-trained LLMs that provides anytime-valid selective risk control on individual streams, enabling safe deployment in regulated settings without pooling or long-run averages.
PASC proposes a conformal prediction method for multi-stage NLP and LLM pipelines that provides finite-sample, distribution-free joint coverage guarantees across all stages, achieving higher empirical coverage and efficiency than baselines like Bonferroni and independent CP.
SAGA introduces a decoder-only transformer for multi-horizon probabilistic forecasting of lifetime earnings, paired with adaptive conformal prediction to provide reliable prediction intervals. Trained on a large Swedish register dataset, it achieves significant improvements over traditional parametric and baseline models.
This paper proposes reusable certified runtime monitors for past-time signal temporal logic (ptSTL) that use semantic latent representations to evaluate varying specifications without retraining, validated on pedestrian-crossroad and Waymo driving data.
This paper presents a framework for error attribution in multi-agent systems using conformal prediction, providing statistical guarantees for identifying decisive errors in agent trajectories. The approach enables automated recovery and debugging by isolating errors within contiguous prediction sets.