Models can predict future events and make money on Polymarket now?

Reddit r/singularity 05/16/26, 03:25 PM Papers

Summary

Researchers at the Max Planck Institute introduced FutureSim, an environment where AI agents predict real-world future events by replaying historical web data. GPT 5.5 running in Codex achieved near-perfect Brier skill scores on some Polymarket markets like Super Bowl LX, outperforming human aggregate markets, though it struggled on others like UK elections and the Grammys.

Researchers from the Max Planck Institute, recently released FutureSim, an environment in which agents are replayed a temporal slice of the web and are tasked with predicting real-world future events. On some questions in their environment that overlap with Polymarket, like the Super Bowl LX market ($704M in trading volume) GPT 5.5 (running in Codex) actually ran ahead of the human-aggregate market and finished with a near-perfect Brier skill score of 0.90. Same story on the Portugal presidential runoff. An agent, with no live web access, just replaying old news, leading a market with hundreds of millions in real money on the line. But it’s not all perfect, the same model gets smoked on UK elections and the Grammys market. Progress on the AI forecasting front seems rapid, will we have reliable future predictors by 2027?

Original Article

Similar Articles

FutureSim: Replaying World Events to Evaluate Adaptive Agents

Hugging Face Daily Papers

FutureSim replays chronological world events to benchmark AI agents' long-term predictive abilities, finding that even the best agent achieves only 25% accuracy.

Prediction markets are breaking the news and becoming their own beat

Hacker News Top

Prediction markets are increasingly influencing news coverage and becoming a subject of journalism in their own right, as platforms like Polymarket gain mainstream attention for forecasting real-world events.

Looking at the data behind prediction markets

Hacker News Top

An analysis of prediction markets like Polymarket and Kalshi, examining whether their massive trading volume actually produces valuable forecasting information or merely serves as gambling, referencing historical academic support and current data.

kept facing with coding agents was hallucinations context loss outdated framework knowledge and models confidently guessing wrong implementations

Reddit r/openclaw

Proxima is a local tool that orchestrates multiple AI models (ChatGPT, Claude, Gemini, Perplexity) to collaborate via MCP, API, CLI, and webhooks, addressing coding agent issues like hallucinations and context loss by enabling multi-model workflows on the user's own machine.

Suraj vs The Future | With ChatGPT

YouTube AI Channels

A promotional video from OpenAI showcasing how to use ChatGPT to prepare smarter for the future, produced by Early Man Film.