Tag
This paper proposes a novel active learning framework that leverages foundation model priors to jointly address class imbalance and label noise, achieving over 50% annotation savings compared to baselines across image and text domains.
This paper identifies the geometric mechanism behind boundary-induced acquisition bias in Gaussian processes on bounded domains, showing how kernel truncation inflates posterior variance and distorts acquisition functions independently of the objective function. The authors introduce a function-free diagnostic to quantify this bias across different acquisition classes.
Researchers from MIT, University of Warwick, and NVIDIA introduce Stein Kernelized Molecular Dynamics (SKMD), an enhanced sampling method that uses interacting particle dynamics to acquire informative training configurations for active learning and fine-tuning of machine learning interatomic potentials (MLIPs). SKMD is a stochastic variant of Stein variational gradient descent adapted for molecular dynamics, preserving the Boltzmann distribution while achieving higher model accuracy in fewer training iterations compared to baselines.
GlossAssist is a tool for creating interlinear glossed text (IGT) corpora in low-resource language documentation settings, built around the CWoMP retrieval-based architecture with an active learning feedback loop that improves predictions as annotators make corrections without retraining the model.
This paper introduces a framework for active timepoint selection to infer probability paths from sparse snapshots, using linearized optimal transport to map distributions into a tangent space for Gaussian Process modeling, thereby enabling uncertainty-aware acquisition policies.
Presents AvAtar, a principled active alignment framework using optimal transport to actively acquire high-quality supervision for improved alignment, leveraging adjoint-state methods for efficient gradient computation.
LLM-AutoSciLab is a closed-loop framework that uses LLMs to iteratively generate hypotheses, select informative experiments, and refine mechanisms, achieving superior accuracy and sample efficiency on physics and biology benchmarks over prior static methods.
The LEAP framework integrates a domain-specialized large language model with active learning to efficiently prioritize precursor additives for perovskite solar cells, achieving improved power conversion efficiencies in experimental validation.
Proposes reframing Pairwise Ranking Prompting (PRP) reranking as active learning from noisy pairwise comparisons, improving NDCG@10 per call under budget constraints, and introduces a randomized-direction oracle that reduces LLM calls per pair.
This paper reframes pairwise ranking prompting as active learning from noisy comparisons, introducing a noise-robust framework with a randomized-direction oracle to improve ranking quality under call constraints and address position bias.
This paper proposes an active learning framework to couple high-fidelity Modelica simulations with simpler surrogate models (SINDyC, FNN, GRU) for creating efficient digital twins of thermal energy distribution systems. The approach significantly reduces the number of simulation trajectories needed while maintaining predictive accuracy and enabling uncertainty quantification.
This paper proposes a framework for conditional generative compressed sensing, proving stable recovery bounds for prompt-conditioned models and demonstrating how prompt matching influences sampling distributions in experiments with Stable Diffusion.
This paper introduces SPADE, a novel algorithm for drug discovery that efficiently identifies high-quality ligands from sparse data using only ~40 tests. It demonstrates superior sample efficiency and speed compared to deep learning and Bayesian optimization methods.
OpenAI presents a comprehensive framework for building robust content moderation systems through careful taxonomy design, data quality control, active learning pipelines, and techniques to prevent overfitting. The approach detects multiple categories of undesired content including sexual content, hate speech, violence, and self-harm, achieving performance superior to existing off-the-shelf models.
OpenAI describes the pre-training data filtering and active learning techniques used to reduce harmful content in DALL·E 2, while also addressing unintended bias amplification caused by data filtering—particularly demographic biases in generated images.