generalization

#generalization

Do Foundation Model Embeddings Improve Cross-Country Crop Yield Generalisation? A Leave-One-Country-Out Evaluation in Sub-Saharan Africa

arXiv cs.LG ↗ · yesterday Cached

This paper evaluates whether geospatial foundation model embeddings like Prithvi-EO improve cross-country crop yield prediction in Sub-Saharan Africa compared to traditional Sentinel-2 features. The study finds that frozen embeddings do not significantly outperform spectral medians under rigorous Leave-One-Country-Out validation, suggesting country-level distribution shift is the primary bottleneck rather than feature representation quality.

0 favorites 0 likes

#generalization

@dair_ai: The Top AI Papers of the Week (May 4 - 10) - Conductor - HeavySkill - Horizon Generalization - 1,000 Synthetic Computer…

X AI KOLs Following ↗ · 3d ago Cached

A weekly roundup of top AI research papers covering topics such as Conductor, HeavySkill, Horizon Generalization, synthetic computers, self-improving pretraining, and AlphaZero for Connect Four.

0 favorites 0 likes

#generalization

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

arXiv cs.LG ↗ · 5d ago Cached

The paper introduces an information-theoretic framework for communication-efficient expert routing in sparse mixture-of-experts models, treating the gate as a stochastic channel and deriving practical mutual information estimators to analyze accuracy-rate tradeoffs over finite expert banks.

0 favorites 0 likes

#generalization

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

arXiv cs.LG ↗ · 5d ago Cached

This paper introduces ADAPT, an online reweighting framework for LLM data curation that dynamically adjusts sample importance during training via loss weighting, outperforming offline selection and mixing methods in cross-benchmark generalization.

0 favorites 0 likes

#generalization

Are Flat Minima an Illusion?

arXiv cs.LG ↗ · 5d ago Cached

This paper challenges the common belief that flat minima cause better generalization in neural networks, arguing that 'weakness'—a reparameterization-invariant measure of function simplicity—is the true driver. Empirical results on MNIST and Fashion-MNIST show that weakness predicts generalization while sharpness anticorrelates, and the large-batch generalization advantage vanishes as training data increases.

0 favorites 0 likes

#generalization

Anthropic researchers detail “model spec midtraining”, which adds a stage between pretraining and fine-tuning to improve generalization from alignment training

Reddit r/artificial ↗ · 6d ago Cached

Anthropic researchers introduce Model Spec Midtraining (MSM), a new training stage between pretraining and fine-tuning designed to improve how models generalize from alignment training and reduce agentic misalignment.

0 favorites 0 likes

#generalization

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

arXiv cs.CL ↗ · 2026-04-20 Cached

OSCBench is a new benchmark designed to evaluate text-to-video generation models' ability to accurately represent object state changes (transformations caused by actions like peeling or slicing). The paper reveals that current T2V models struggle with temporally consistent state changes, especially in novel and compositional scenarios, identifying this as a key bottleneck in video generation.

0 favorites 0 likes

#generalization

MARCO: Navigating the Unseen Space of Semantic Correspondence

Hugging Face Daily Papers ↗ · 2026-04-20 Cached

MARCO introduces a compact, fast model for semantic correspondence that achieves state-of-the-art accuracy and generalization to unseen keypoints using a coarse-to-fine objective and self-distillation framework with DINOv2.

0 favorites 0 likes

#generalization

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

Hugging Face Daily Papers ↗ · 2026-04-14 Cached

RoboLab is a high-fidelity simulation benchmarking framework for evaluating task-generalist robotic policies, introducing the RoboLab-120 benchmark with 120 tasks across visual, procedural, and relational competency axes. It enables scalable, realistic task generation and systematic analysis of policy behavior under controlled perturbations to assess true generalization capabilities.

0 favorites 0 likes

#generalization

Deep double descent

OpenAI Blog ↗ · 2019-12-05 Cached

OpenAI research reveals the 'double descent' phenomenon where test error exhibits a non-monotonic pattern as both model size and training steps increase, challenging traditional understanding of the bias-variance tradeoff in deep learning.

0 favorites 0 likes

#generalization

Procgen Benchmark

OpenAI Blog ↗ · 2019-12-03 Cached

OpenAI introduces Procgen Benchmark, a suite of procedurally generated environments designed to evaluate generalization in reinforcement learning agents across diverse tasks, addressing overfitting issues in traditional benchmarks like Atari.

0 favorites 0 likes

#generalization

Quantifying generalization in reinforcement learning

OpenAI Blog ↗ · 2018-12-06 Cached

OpenAI trained 9 agents on the CoinRun environment with varying numbers of training levels to quantify generalization in reinforcement learning, finding substantial overfitting even with 16,000 training levels and that IMPALA-CNN architectures generalize significantly better than Nature-CNN baselines.

0 favorites 0 likes

#generalization

Retro Contest: Results

OpenAI Blog ↗ · 2018-06-22 Cached

OpenAI's Retro Contest concluded with 923 teams competing to develop generalizable algorithms using the Sonic benchmark. Top performers primarily used tuned versions of existing algorithms like PPO and Rainbow DQN, with Dharmaraja winning first place with a score of 4,692 out of a theoretical maximum of 10,000.

0 favorites 0 likes

#generalization

Gym Retro

OpenAI Blog ↗ · 2018-05-25 Cached

OpenAI releases Gym Retro, a reinforcement learning research environment featuring games from classic gaming consoles (Sega Genesis, NES, SNES, Game Boy, etc.) to study agent generalization across different games and levels.

0 favorites 0 likes

#generalization

Evolved Policy Gradients

OpenAI Blog ↗ · 2018-04-18 Cached

OpenAI introduces Evolved Policy Gradients (EPG), a meta-learning approach that learns loss functions through evolution rather than learning policies directly, enabling RL agents to generalize better across tasks by leveraging prior experience similar to how humans transfer skills.

0 favorites 0 likes

#generalization

Gotta Learn Fast: A new benchmark for generalization in RL

OpenAI Blog ↗ · 2018-04-10 Cached

OpenAI presents a new reinforcement learning benchmark based on Sonic the Hedgehog to measure transfer learning and few-shot learning performance in RL agents, along with baseline algorithm evaluations.

0 favorites 0 likes

#generalization

Retro Contest

OpenAI Blog ↗ · 2018-04-05 Cached

OpenAI launched the Retro Contest, a transfer learning competition that evaluates RL algorithms on unseen video game levels from classic SEGA Genesis games, running from April to June 2018. The contest uses Gym Retro platform and includes baseline implementations and a technical benchmark paper demonstrating that current RL algorithms significantly underperform humans on generalization tasks.

0 favorites 0 likes

#generalization

Extensions and limitations of the neural GPU

OpenAI Blog ↗ · 2016-11-02 Cached

This paper explores extensions and limitations of the Neural GPU model, demonstrating improvements through curriculum design and scaling, enabling it to learn arithmetic operations on decimal numbers and long expressions while identifying failure modes on symmetric inputs analogous to adversarial examples.

0 favorites 0 likes

generalization

Submit Feedback