Tag
Introduces Autodata, a method where AI agents act as data scientists to create high-quality synthetic training data, showing gains on computer science, legal, and math reasoning tasks over classical methods.
Proposes a hierarchical Bayesian framework for meta-learning in dynamical systems from multiple sparse, noisy datasets, using gradient-based MCMC with an embedded ODE solver for efficient posterior inference of shared and dataset-specific parameters.
Proposes a novel meta-learning strategy called MEDIC for open set domain generalization, which uses implicit gradient matching across domain and class splits to achieve better boundaries. Experiments show state-of-the-art performance.
This paper presents Connect the Dots (CoD), a framework for training LLMs via reinforcement learning to develop meta-capabilities for long-lifecycle agents, enabling continuous learning and cross-domain generalization.
Proposes ReGrad, a paradigm that treats gradients as retrievable units of knowledge for continual post-training, avoiding cumulative weight drift by storing document-specific gradients in a Gradient Bank and retrieving query-relevant gradients for temporary weight adaptation.
This paper argues that recent claims that neural networks have solved Fodor and Pylyshyn's systematicity challenge are premature. The authors show that the meta-learning for compositionality model fails to generalize out-of-distribution and behaves unsystematically even on in-distribution problems, concluding the challenge remains unmet.
Introduces WIZARD, a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies from language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning.
This paper proposes a three-stage diagnostic framework to identify why offline model selectors fail to beat the best single model, applying it to dropout prediction on edX clickstream data. The study finds that the bottleneck is local representational ambiguity rather than learner choice or distribution shift, recommending state redesign or new data collection over further algorithm tuning.
SePO (Self-Evolving Prompt Optimization) proposes a self-referential prompt agent that optimizes both task agents' system prompts and its own system prompt through an evolutionary search, outperforming Manual-CoT, TextGrad, and MetaSPO across five benchmarks including AIME'25, ARC-AGI-1, and GPQA.
R-APS (Reflective Adversarial Pareto Search) is a novel method for constrained design tasks that addresses three structural failures in LLM-based agentic systems—error propagation, robustness evaluation, and knowledge invalidation—through reasoning-mode decomposition across three timescales, requiring no fine-tuning. Evaluated on planar mechanism synthesis, it achieves 3.5x tighter robustness certificates, 46% faster iterations-to-first-admission, and 2.1x Chamfer-distance reduction over baselines.
CHAM-net introduces a contrastive hierarchical adaptive meta-network that captures site-specific and cross-year dynamics for robust global methane flux prediction, outperforming baseline methods on simulation and observational datasets.
The paper argues that unlearning in LLMs should be goal-dependent, proposing a cosine-based meta-learned variant of RMU for dangerous knowledge and a multi-layer objective with probe directions for toxicity, achieving strong results across four 7-8B models.
This paper theoretically characterizes the representational capacity of Neural Process (NP) architectures, proving a strict hierarchy among Conditional, Attentive, Convolutional, and Transformer NPs, and showing that finite-dimensional latent variables do not expand representational capacity beyond the encoder.
This paper decomposes the predictive KL divergence between Gaussian process and latent neural process posteriors into three terms, providing upper bounds that characterize approximation errors and connecting representation dimension to kernel smoothness.
SOLAR proposes a self-optimizing autonomous agent that leverages parameter-level meta-learning and multi-level reinforcement learning to enable lifelong adaptation of LLMs to non-stationary data streams, outperforming baselines on reasoning tasks.
Jerry Tworek and François Chollet discuss the path to AGI, covering the definition of intelligence, the role of games, and why meta-learning is the closest approach.
This paper introduces NoiseRater, a meta-learning framework that assigns importance scores to individual noise samples during diffusion model training to improve efficiency and generation quality.
This paper introduces RubricEM, a reinforcement learning framework that uses rubric-guided policy decomposition and reflection-based meta-policy evolution to train deep research agents for long-form tasks. The resulting RubricEM-8B model demonstrates strong performance on long-form research benchmarks by leveraging stage-aware planning and denser semantic feedback.
University of Memphis researchers propose HAMR, a model-agnostic meta-learning framework that uses bi-level optimization and neighborhood-aware resampling to adaptively reweight hard examples and minority classes across six imbalanced NLP datasets.
FSPO proposes a few-shot preference optimization algorithm for LLM personalization that reframes reward modeling as meta-learning, enabling models to quickly infer personalized reward functions from limited user preferences. The method achieves 87% personalization performance on synthetic users and 70% on real users through careful synthetic preference dataset construction.