Tag
This paper identifies and analyzes 'physical misgeneralization' in generative sequence models, where individual trajectories appear plausible but the aggregate distribution over physical quantities is incorrect, and proposes a kernel-informed mitigation.
An experiment feeding GPT-4o, Claude 3.5 Sonnet, and other models the same double pendulum prompt reveals they pick opposite angle conventions, causing immediate visible mismatch in a shared renderer. The convention split, non-random across model families, suggests a bias in training data distribution for classical mechanics problems.