benchmark-degradation

Tag

Cards List
#benchmark-degradation

Why do newer SOTA models get progressively worse on Vendingbench?

Reddit r/singularity · 2026-05-29

A discussion on why newer state-of-the-art AI models are performing worse on the Vendingbench benchmark, suggesting factors such as cheating in earlier runs, ethical alignment reducing profit-seeking behavior, and catastrophic forgetting due to overemphasis on coding.

0 favorites 0 likes
← Back to home

Submit Feedback