On first-order meta-learning algorithms
Summary
This paper analyzes first-order meta-learning algorithms for few-shot learning, introducing Reptile and providing theoretical insights into why these computationally efficient methods work well on established benchmarks.
View Cached Full Text
Cached at: 04/20/26, 02:56 PM
Similar Articles
Reptile: A scalable meta-learning algorithm
OpenAI introduces Reptile, a scalable meta-learning algorithm for few-shot classification that achieves comparable performance to MAML while converging faster with lower variance. The paper provides theoretical analysis showing Reptile maximizes inner product between task gradients for improved generalization.
One-shot imitation learning
OpenAI proposes a meta-learning framework for one-shot imitation learning that enables robots to learn new tasks from a single demonstration and generalize to new instances without task-specific engineering. The approach uses soft attention mechanisms to allow neural networks trained on diverse task pairs to perform well on unseen tasks at test time.
Gotta Learn Fast: A new benchmark for generalization in RL
OpenAI presents a new reinforcement learning benchmark based on Sonic the Hedgehog to measure transfer learning and few-shot learning performance in RL agents, along with baseline algorithm evaluations.
Some considerations on learning to explore via meta-reinforcement learning
OpenAI researchers introduce E-MAML and E-RL², two meta-reinforcement learning algorithms designed to improve exploration in tasks where discovering optimal policies requires significant exploration. The work demonstrates these algorithms' effectiveness on novel environments including Krazy World and maze tasks.
FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users
FSPO proposes a few-shot preference optimization algorithm for LLM personalization that reframes reward modeling as meta-learning, enabling models to quickly infer personalized reward functions from limited user preferences. The method achieves 87% personalization performance on synthetic users and 70% on real users through careful synthetic preference dataset construction.