Tag
This paper formalizes Streaming Knowledge Compilation for LLM wikis, introducing a materiality signal to proactively pin important documents from a streaming corpus under a token budget. It proves an O(√(T log K)) regret bound and validates the approach in finance and Wikipedia domains, showing that regret analysis is a reliable evaluation metric.
This paper introduces an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs, proposing a learning approach that combines GMM estimation with UCB-style confidence bounds and proving dimension-dependent regret bounds.
Proposes a truthful online preference aggregation mechanism for LLM fine-tuning in mobile crowdsourcing, addressing strategic worker misreporting and achieving sublinear regret.
This note presents a research moment where Codex helped find a new rare-switching rule for private linear bandits, using the generalized Rayleigh quotient to overcome the failure of determinant-based monotonicity due to Gaussian noise.