Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints
Summary
Academic study compares SARIMAX and Poisson regression for forecasting sparse, bursty vulnerability-sighting time-series, finding count-based models more stable.
View Cached Full Text
Cached at: 04/21/26, 11:27 AM
Paper page - Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints
Source: https://huggingface.co/papers/2604.16038 Published on Apr 17
·
Submitted byhttps://huggingface.co/cedricbonhomme
Cédricon Apr 21
Abstract
Forecasting vulnerability-related activities using time-series models reveals challenges with sparse, bursty data, favoring count-based methods like Poisson regression for more stable predictions.
Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whethervulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work onVLAI, atransformer-based modelthat predicts vulnerability severity from textual descriptions, we examine whetherseverity scorescan improvetime-series forecastingas exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we testSARIMAXmodels with and without log(x+1) transformations andVLAI-derived severity inputs. Although these adjustments provide limited improvements,SARIMAXremains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such asPoisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, includingexponential decay functionsfor short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.
View arXiv pageView PDFProject pageGitHub2Add to collection
Get this paper in your agent:
hf papers read 2604\.16038
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.16038 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.16038 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.16038 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Do Time Series Foundation Model Benchmarks Hide Regime-Dependent Failures? Evidence from Traffic Speed Forecasting
This paper introduces regime-stratified evaluation for time series foundation models, revealing that aggregate metrics hide severe failures during traffic regime transitions, and proposes bimodal mixture augmentation to improve coverage while preserving overall accuracy.
Stationarity-Aware Retrieval-Augmented Time Series Forecasting
SARAF is a Stationarity-Aware Retrieval-Augmented Forecasting framework that adaptively balances relevance and diversity in retrieval for time series forecasting, modulating diversification strength based on dataset-level stationarity to handle non-stationary regime shifts. Accepted to KDD 2026, it demonstrates competitive performance over strong baselines on eight real-world datasets.
TS-Fault: Benchmarking Time Series Forecasters Against Structural Faults
This paper introduces TS-Fault, a benchmark for evaluating time series forecasting models under structured fault scenarios like broken dependencies and regime changes, finding that clean-data accuracy often anti-correlates with robustness and that foundation models are especially fragile.
Nested Spatio-Temporal Time Series Forecasting
This paper proposes a nested spatiotemporal forecasting framework that uses spectral clustering to construct semantically coherent macro-level regions, which provide top-down guidance for fine-grained micro-level predictions. Experiments on high-dimensional datasets show consistent improvements over state-of-the-art baselines.
EnergyMamba: An Uncertainty-Aware Graph-Enhanced Selective State Space Model for Energy Consumption Prediction
EnergyMamba proposes a novel spatiotemporal framework combining a graph-enhanced selective state space model and adaptive conformalized quantile regression for accurate and reliable energy consumption prediction with uncertainty estimates, achieving improvements on real-world datasets from Florida, New York, and California.