Tag
Researchers at the Max Planck Institute introduced FutureSim, an environment where AI agents predict real-world future events by replaying historical web data. GPT 5.5 running in Codex achieved near-perfect Brier skill scores on some Polymarket markets like Super Bowl LX, outperforming human aggregate markets, though it struggled on others like UK elections and the Grammys.