online-learning

#online-learning

Learning to Trigger: Reinforcement Learning at the Large Hadron Collider

arXiv cs.LG ↗ · 4d ago Cached

This paper presents a reinforcement learning approach for dynamically adjusting trigger thresholds at the Large Hadron Collider, improving signal efficiency and maintaining background rates, with the first demonstration on real collision data.

0 favorites 0 likes

#online-learning

A Comparative Study of Bayesian Contextual Bandits for Real-Time Warehouse Sorter Optimization

arXiv cs.LG ↗ · 4d ago Cached

This paper presents a comparative study of Bayesian Contextual Bandits, XGBoost, and Linear Regression for real-time sorter diversion optimization in e-commerce warehouses, showing BCB achieves 2.03% reward uplift with superior online learning and inference latency.

0 favorites 0 likes

#online-learning

@GoSailGlobal: https://x.com/GoSailGlobal/status/2068879365711032708

X AI KOLs Timeline ↗ · 6d ago Cached

gwern proposed the 'Guardian Angel' approach, advocating for training an LLM digital twin that imitates the user themselves, in order to solve the principal-agent problem and security risks of general AI assistants, and provided a complete roadmap from alignment theory to technical implementation.

0 favorites 0 likes

#online-learning

Online LLM Selection via Constrained Bandits with Time-Varying Demand

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper proposes a constrained stochastic bandit algorithm for online selection of large language models under time-varying task demand and heterogeneous accuracy, latency, and cost profiles, with theoretical guarantees on regret and constraint violations.

0 favorites 0 likes

#online-learning

Treatment Response Optimized Clinical Decision Support AI System via Digital Twin Simulation

arXiv cs.AI ↗ · 2026-06-17 Cached

This paper presents an online adaptive clinical decision support AI system that integrates treatment effect estimation, digital twin simulation, and reinforcement learning to recommend treatments in a safe, clinician-supervised manner, validated on a synthetic simulator and the TCGA ovarian cancer dataset.

0 favorites 0 likes

#online-learning

Policy Regret for Embedding Model Routing: Contextual Bandits with Low-Rank Experts

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper formalizes embedding model routing as an adversarial contextual linear bandit with low-rank experts, proposing the Hypentropy Policy Gradient (HPG) algorithm that achieves O~(s√(MT)) policy regret, avoiding the curse of dimensionality.

0 favorites 0 likes

#online-learning

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

Hacker News Top ↗ · 2026-06-09 Cached

This post explains the author's Master's thesis on using Kolmogorov-Arnold Networks (KANs) for ultrafast machine learning on FPGAs, achieving sub-microsecond inference and online learning via custom hardware architectures. It references two accepted papers: KANELÉ for LUT-based evaluation (FPGA 2026 Best Paper) and a method for on-FPGA online learning (ICML 2026).

0 favorites 0 likes

#online-learning

Online Pandora's Box for Contextual LLM Cascading

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper introduces an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs, proposing a learning approach that combines GMM estimation with UCB-style confidence bounds and proving dimension-dependent regret bounds.

0 favorites 0 likes

#online-learning

CLaaS: Continual learning as a service for sample efficient online learning

arXiv cs.LG ↗ · 2026-06-05 Cached

CLaaS is a system for continual learning of LLM agents in deployment, using experience replay for sample-efficient online adaptation.

0 favorites 0 likes

#online-learning

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

arXiv cs.AI ↗ · 2026-06-04 Cached

This paper proposes SGDR (State-Grounded Dynamic Retrieval), an online skill learning method for web agents that enables stepwise, state-aware skill reuse rather than static task-level retrieval. Experiments on WebArena show SGDR achieves 37.5% success rate with GPT-4.1, a ~10.6% relative gain over strong baselines.

0 favorites 0 likes

#online-learning

Regret Minimization with Adaptive Opponents in Repeated Games

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

This paper introduces Repeated Policy Regret (RP-Regret), a game-theoretic metric for regret minimization in repeated games with adaptive opponents, and proposes three algorithms to minimize it, showing that doing so can lead to cooperative equilibria like in Stag-Hunt.

0 favorites 0 likes

#online-learning

SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition

arXiv cs.AI ↗ · 2026-06-02 Cached

SHARP introduces a bio-inspired framework that separates memory accumulation from pattern recognition, using accelerated replay during offline sleep phases to learn long-range non-stationary temporal patterns in streaming settings. It improves context retention on text8 and PG-19 while maintaining computational efficiency.

0 favorites 0 likes

#online-learning

Adversarially Robust Control of Conditional Value-at-Risk via Rockafellar-Uryasev Conformal Inference

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper presents an online, distribution-free framework for controlling Conditional Value-at-Risk (CVaR) in adversarial and non-stationary environments, with asymptotic guarantees and applications in portfolio risk management and LLM toxicity mitigation.

0 favorites 0 likes

#online-learning

UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling

arXiv cs.AI ↗ · 2026-06-01 Cached

Proposes UniScale, an online framework that unifies model routing and test-time scaling via contextual bandit optimization for better quality-cost trade-offs in LLM inference.

0 favorites 0 likes

#online-learning

Universal Multiclass Transductive Online Learning

arXiv cs.LG ↗ · 2026-06-01 Cached

This paper introduces the Level-Constrained-Littlestone-Littlestone (LCLL) tree to characterize learnability in universal transductive online classification with possibly unbounded label spaces, proving that optimal mistake rates are either bounded or logarithmic.

0 favorites 0 likes

#online-learning

Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper solves a COLT open problem by providing an optimal gap-dependent regret algorithm for private stochastic decision-theoretic online learning, achieving the lower bound of order (log K)/Δ_min + (log K)/ε.

0 favorites 0 likes

#online-learning

Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper proves that online gradient descent achieves optimal √T regret for hidden-convex losses under a Hessian compatibility condition, resolving open questions in adversarial online learning. It also extends results to one-point bandit feedback with a T^{3/4} expected regret bound.

0 favorites 0 likes

#online-learning

Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper presents a decision support framework for digital therapeutics that models patient adherence as endogenous and uses online learning to optimize treatment recommendations, achieving sublinear regret.

0 favorites 0 likes

#online-learning

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

arXiv cs.LG ↗ · 2026-05-26 Cached

Proposes a truthful online preference aggregation mechanism for LLM fine-tuning in mobile crowdsourcing, addressing strategic worker misreporting and achieving sublinear regret.

0 favorites 0 likes

#online-learning

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper proposes MODIAD, a framework for multimodal online distributed industrial anomaly detection, addressing resource constraints with a Multi-class Intelligent Scheduling problem and a Resource Efficient Class-Wise Low Rank Adaptation (REC-LoRA) strategy. Experiments on MVTec 3D-AD and Eyecandies datasets demonstrate superior performance and efficiency.

0 favorites 0 likes

online-learning

Submit Feedback