Tag
Explores whether ensembles of AI models could outperform human crowds in prediction markets, questioning if AI consensus will eventually surpass human forecasting accuracy.
This paper introduces a fail-closed certification protocol to determine when a forecasting leaderboard winner can be reliably used as deployment-ready top-1 advice, given a fixed decision interface and deployed utility. It presents a locked native audit that prevents overclaiming by blocking apparent forecast/deployment winner inversions.
The paper proposes RAVEN, a Mixture-of-Experts framework that adaptively determines temporal context windows for each input sample to handle non-stationary financial time series. It achieves state-of-the-art performance on financial and traffic benchmarks.
Amazon open-sourced Chronos, a time-series forecasting model that predicts out of the box without training or feature engineering, treating forecasting like language models treat text.
An analysis of AI model size scaling trends from 2023 to 2031, published on LessWrong.
Introduces DeXposure-Claw, a forecast-grounded agentic system for DeFi risk supervision that uses a graph time-series foundation model to forecast exposure networks, with deterministic monitors and confidence gates to constrain LLM-generated supervisory tickets. Also presents DeXposure-Bench, a six-axis evaluation harness for regulator-aligned assessment.
This article analyzes and projects forward Metr's time horizon data, likely related to AI development timelines and forecasting.
Introduces ForecastBench-Sim, a simulated-world forecasting benchmark built on game rollouts from Freeciv, designed to provide controlled, immediately resolvable tasks for evaluating probabilistic reasoning in AI systems.
Google has released TimesFM, a time series forecasting model trained on 100 billion real-world time series data, supporting zero-shot prediction. It is free, open-source, and can run locally on ordinary computers.
Google has released TimesFM, an AI model for zero-shot time series forecasting, trained on 100 billion real data points, free and open-source.
This paper examines whether ML models can beat the random walk benchmark in forecasting USD/CAD exchange rates, finding that only linear regression statistically outperforms the naive model, with SHAP analysis showing short-term lags dominate predictions.
A reflection on how AI recommendations at scale might shape collective behavior and the future, suggesting that asking what AI tells people could be a forecasting method.
This paper proposes ORCA, a method for black-box online adaptation of time series foundation models by learning the context of predictive errors. It demonstrates effectiveness across five TSFMs and eight datasets, addressing the challenge of adapting closed-source API-based models.
Introduces Behavior Forecasters (BFs) that take reasoning trajectories as input and achieve more accurate forecasts than frontier models at a fraction of the cost.
APEX is a network-native, decoder-only transformer for forecasting and anomaly detection in wireless edge telemetry, pre-trained on data from ~4,500 production networks. It achieves 18% lower MAE than the best general-purpose time-series foundation model on a DHCP degradation benchmark and enables sub-second inference on edge hardware.
This paper proposes Behavior Forecasters, a learned approach that predicts an LRM's future behavior (e.g., answer consistency and input sensitivity) from its reasoning trajectory, outperforming GPT-5.4 and Claude Opus 4.6 at lower cost.
This paper introduces MF-Net, a recurrent dynamical model that represents multivariate systems through a shared field state and learns a mechanical transition for joint evolution. It achieves competitive forecasting while enabling interpretable structural readout of learned relations.
This paper systematically evaluates 11 synthetic time-series generators for foundation model pretraining and finds that generator rankings are not stable across architectures, but an equal-weight mixture of all generators matches or beats the best individual. Blending this mixture with real data yields the strongest pretraining corpora, reframing synthetic pretraining as a corpus composition problem rather than a generator selection problem.
Introduces UniTok, a universal tokenizer that transforms continuous time series into discrete tokens, and UniTok-FM, a foundation model pretrained via next-token prediction that enables zero-shot and prompt-boosted forecasting as well as few-shot generation and classification through training-free in-context inference.
ReGeN is a reference-guided generative pipeline for multivariate time series data that decomposes observed sequences into periodic backbone, stochastic residuals, and cross-variable dependencies to synthesize controllable synthetic data. It demonstrates that generated data can substitute for real data in forecasting tasks, outperforming prior synthetic data generators.