Tag
This paper introduces Explanation Quality Markers (EQMs), a set of 60 reasoning patterns scored by LLMs to measure the quality of natural-language explanations in forecasting tournaments. Analyzing over 55,000 forecast-rationale pairs, EQMs predict accuracy at both forecast and forecaster levels, outperforming previous methods.
This paper introduces a unified decision-theoretic pretraining framework for neural network-based time series estimators, trained on stratified simulations to approximate near-optimal decision rules. Experiments show that the resulting estimators outperform traditional methods like maximum likelihood estimation on both synthetic and real-world benchmarks.
This paper proposes KARMA, a method for explaining multivariate time series forecasting models by constructing a K-order Markov surrogate model that captures temporal dependencies, offering a five-level global explanation hierarchy.
Darts, a popular open-source Python library for time series analysis, introduces a unified FoundationModel class collection that integrates multiple time series foundation models (Chronos-2, TimesFM 2.5, TiRex, PatchTST-FM) for zero-shot and fine-tuned forecasting with standardized interfaces and minimal dependencies.
Explores whether ensembles of AI models could outperform human crowds in prediction markets, questioning if AI consensus will eventually surpass human forecasting accuracy.
This paper introduces a fail-closed certification protocol to determine when a forecasting leaderboard winner can be reliably used as deployment-ready top-1 advice, given a fixed decision interface and deployed utility. It presents a locked native audit that prevents overclaiming by blocking apparent forecast/deployment winner inversions.
This paper demonstrates that careful preprocessing—especially context length selection, normalization, and regularization—can make simple linear models like Ridge regression competitive with or superior to large Transformer, MLP, and CNN models on time-series forecasting benchmarks.
EO-WM proposes a video diffusion transformer for probabilistic Earth observation forecasting that incorporates physically informed conditioning to capture weather-driven uncertainties, achieving improved prediction of vegetation indices under extreme weather.
The paper proposes RAVEN, a Mixture-of-Experts framework that adaptively determines temporal context windows for each input sample to handle non-stationary financial time series. It achieves state-of-the-art performance on financial and traffic benchmarks.
Amazon open-sourced Chronos, a time-series forecasting model that predicts out of the box without training or feature engineering, treating forecasting like language models treat text.
An analysis of AI model size scaling trends from 2023 to 2031, published on LessWrong.
Foresight by Lightning Rod is an AI-powered tool that claims to predict anything, launched on Product Hunt.
Introduces DeXposure-Claw, a forecast-grounded agentic system for DeFi risk supervision that uses a graph time-series foundation model to forecast exposure networks, with deterministic monitors and confidence gates to constrain LLM-generated supervisory tickets. Also presents DeXposure-Bench, a six-axis evaluation harness for regulator-aligned assessment.
This article analyzes and projects forward Metr's time horizon data, likely related to AI development timelines and forecasting.
Introduces ForecastBench-Sim, a simulated-world forecasting benchmark built on game rollouts from Freeciv, designed to provide controlled, immediately resolvable tasks for evaluating probabilistic reasoning in AI systems.
Google has released TimesFM, a time series forecasting model trained on 100 billion real-world time series data, supporting zero-shot prediction. It is free, open-source, and can run locally on ordinary computers.
Google has released TimesFM, an AI model for zero-shot time series forecasting, trained on 100 billion real data points, free and open-source.
This paper examines whether ML models can beat the random walk benchmark in forecasting USD/CAD exchange rates, finding that only linear regression statistically outperforms the naive model, with SHAP analysis showing short-term lags dominate predictions.
A reflection on how AI recommendations at scale might shape collective behavior and the future, suggesting that asking what AI tells people could be a forecasting method.
This paper proposes ORCA, a method for black-box online adaptation of time series foundation models by learning the context of predictive errors. It demonstrates effectiveness across five TSFMs and eight datasets, addressing the challenge of adapting closed-source API-based models.