Tag
This paper presents a reinforcement learning approach for dynamically adjusting trigger thresholds at the Large Hadron Collider, improving signal efficiency and maintaining background rates, with the first demonstration on real collision data.
This paper presents a comparative study of Bayesian Contextual Bandits, XGBoost, and Linear Regression for real-time sorter diversion optimization in e-commerce warehouses, showing BCB achieves 2.03% reward uplift with superior online learning and inference latency.
gwern proposed the 'Guardian Angel' approach, advocating for training an LLM digital twin that imitates the user themselves, in order to solve the principal-agent problem and security risks of general AI assistants, and provided a complete roadmap from alignment theory to technical implementation.
This paper proposes a constrained stochastic bandit algorithm for online selection of large language models under time-varying task demand and heterogeneous accuracy, latency, and cost profiles, with theoretical guarantees on regret and constraint violations.
This paper presents an online adaptive clinical decision support AI system that integrates treatment effect estimation, digital twin simulation, and reinforcement learning to recommend treatments in a safe, clinician-supervised manner, validated on a synthetic simulator and the TCGA ovarian cancer dataset.
This paper formalizes embedding model routing as an adversarial contextual linear bandit with low-rank experts, proposing the Hypentropy Policy Gradient (HPG) algorithm that achieves O~(s√(MT)) policy regret, avoiding the curse of dimensionality.
This post explains the author's Master's thesis on using Kolmogorov-Arnold Networks (KANs) for ultrafast machine learning on FPGAs, achieving sub-microsecond inference and online learning via custom hardware architectures. It references two accepted papers: KANELÉ for LUT-based evaluation (FPGA 2026 Best Paper) and a method for on-FPGA online learning (ICML 2026).
This paper introduces an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs, proposing a learning approach that combines GMM estimation with UCB-style confidence bounds and proving dimension-dependent regret bounds.
CLaaS is a system for continual learning of LLM agents in deployment, using experience replay for sample-efficient online adaptation.
This paper proposes SGDR (State-Grounded Dynamic Retrieval), an online skill learning method for web agents that enables stepwise, state-aware skill reuse rather than static task-level retrieval. Experiments on WebArena show SGDR achieves 37.5% success rate with GPT-4.1, a ~10.6% relative gain over strong baselines.
This paper introduces Repeated Policy Regret (RP-Regret), a game-theoretic metric for regret minimization in repeated games with adaptive opponents, and proposes three algorithms to minimize it, showing that doing so can lead to cooperative equilibria like in Stag-Hunt.
SHARP introduces a bio-inspired framework that separates memory accumulation from pattern recognition, using accelerated replay during offline sleep phases to learn long-range non-stationary temporal patterns in streaming settings. It improves context retention on text8 and PG-19 while maintaining computational efficiency.
This paper presents an online, distribution-free framework for controlling Conditional Value-at-Risk (CVaR) in adversarial and non-stationary environments, with asymptotic guarantees and applications in portfolio risk management and LLM toxicity mitigation.
Proposes UniScale, an online framework that unifies model routing and test-time scaling via contextual bandit optimization for better quality-cost trade-offs in LLM inference.
This paper introduces the Level-Constrained-Littlestone-Littlestone (LCLL) tree to characterize learnability in universal transductive online classification with possibly unbounded label spaces, proving that optimal mistake rates are either bounded or logarithmic.
This paper solves a COLT open problem by providing an optimal gap-dependent regret algorithm for private stochastic decision-theoretic online learning, achieving the lower bound of order (log K)/Δ_min + (log K)/ε.
This paper proves that online gradient descent achieves optimal √T regret for hidden-convex losses under a Hessian compatibility condition, resolving open questions in adversarial online learning. It also extends results to one-point bandit feedback with a T^{3/4} expected regret bound.
This paper presents a decision support framework for digital therapeutics that models patient adherence as endogenous and uses online learning to optimize treatment recommendations, achieving sublinear regret.
Proposes a truthful online preference aggregation mechanism for LLM fine-tuning in mobile crowdsourcing, addressing strategic worker misreporting and achieving sublinear regret.
This paper proposes MODIAD, a framework for multimodal online distributed industrial anomaly detection, addressing resource constraints with a Multi-class Intelligent Scheduling problem and a Resource Efficient Class-Wise Low Rank Adaptation (REC-LoRA) strategy. Experiments on MVTec 3D-AD and Eyecandies datasets demonstrate superior performance and efficiency.