Tag
This paper introduces CAGE, a counterfactual graph-based method for calibrating multi-agent LLM systems, evaluating on benchmarks like TriviaQA and MMLU-Pro across various communication topologies. The method outperforms existing post-hoc and LLM-elicited calibration approaches.
This paper reveals a fundamental vulnerability in LLM watermarking: when users have access to multiple models, averaging their output distributions cancels watermark perturbations, enabling detection evasion. The authors propose WASH and demonstrate empirically that averaging 3-5 models suppresses detection z-scores below thresholds while improving text quality.
This paper presents a comparative evaluation of classical, ensemble, and neural machine learning approaches for predicting financial distress under severe class imbalance, using SMOTE for oversampling and SHAP for interpretability.
This paper investigates disagreement-based drift detection in ensembles of incremental decision trees, finding that while effective in neural networks, the method underperforms loss-based detectors for tree ensembles due to limited model plasticity.
OpenAI presents a novel exploration strategy for deep reinforcement learning using ensembles of Q-functions with upper-confidence bounds (UCB), demonstrating significant performance improvements on the Atari benchmark.
OpenAI presents PATE (Private Aggregation of Teacher Ensembles), a privacy-preserving approach that trains a student model on noisy outputs from multiple teacher models trained on disjoint datasets, providing strong differential privacy guarantees without exposing sensitive training data.