Tag
This paper introduces NoiseRater, a meta-learning framework that assigns importance scores to individual noise samples during diffusion model training to improve efficiency and generation quality.
This paper introduces RubricEM, a reinforcement learning framework that uses rubric-guided policy decomposition and reflection-based meta-policy evolution to train deep research agents for long-form tasks. The resulting RubricEM-8B model demonstrates strong performance on long-form research benchmarks by leveraging stage-aware planning and denser semantic feedback.
University of Memphis researchers propose HAMR, a model-agnostic meta-learning framework that uses bi-level optimization and neighborhood-aware resampling to adaptively reweight hard examples and minority classes across six imbalanced NLP datasets.
FSPO proposes a few-shot preference optimization algorithm for LLM personalization that reframes reward modeling as meta-learning, enabling models to quickly infer personalized reward functions from limited user preferences. The method achieves 87% personalization performance on synthetic users and 70% on real users through careful synthetic preference dataset construction.
This paper proposes ACSESS, a method for automatically combining multiple sample selection strategies to improve few-shot learning across both in-context learning and gradient-based approaches. The work demonstrates that combining strategies consistently outperforms individual selection methods across 14 datasets with both text and image modalities.
This paper proposes WORC, a weak-link optimization framework for multi-agent LLM systems that identifies and reinforces underperforming agents through meta-learning-based weight prediction and uncertainty-driven resource allocation, achieving 82.2% accuracy on reasoning benchmarks while improving system stability.
This paper introduces a meta-optimized approach for semantic visual decoding from fMRI signals that generalizes to novel subjects without fine-tuning, using in-context learning to infer unique neural encoding patterns from a small set of image-brain activation examples. The method achieves strong cross-subject and cross-scanner generalization without requiring anatomical alignment or stimulus overlap.
OpenAI introduces Evolved Policy Gradients (EPG), a meta-learning approach that learns loss functions through evolution rather than learning policies directly, enabling RL agents to generalize better across tasks by leveraging prior experience similar to how humans transfer skills.
This paper analyzes first-order meta-learning algorithms for few-shot learning, introducing Reptile and providing theoretical insights into why these computationally efficient methods work well on established benchmarks.
OpenAI introduces Reptile, a scalable meta-learning algorithm for few-shot classification that achieves comparable performance to MAML while converging faster with lower variance. The paper provides theoretical analysis showing Reptile maximizes inner product between task gradients for improved generalization.
OpenAI researchers develop meta-learning agents that continuously adapt their policies during multi-round competitive games, demonstrating superior performance compared to fixed-policy agents and robustness to environmental and bodily changes.
OpenAI proposes a meta-learning framework for one-shot imitation learning that enables robots to learn new tasks from a single demonstration and generalize to new instances without task-specific engineering. The approach uses soft attention mechanisms to allow neural networks trained on diverse task pairs to perform well on unseen tasks at test time.
RL² proposes encoding a fast reinforcement learning algorithm as the weights of a recurrent neural network, learned through slow general-purpose RL, enabling agents to adapt to new tasks with few trials similar to biological learning. The method demonstrates strong performance on both small-scale bandit problems and large-scale vision-based navigation tasks.