Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

Hugging Face Daily Papers Papers

Summary

Academic study compares SARIMAX and Poisson regression for forecasting sparse, bursty vulnerability-sighting time-series, finding count-based models more stable.

Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work on VLAI, a transformer-based model that predicts vulnerability severity from textual descriptions, we examine whether severity scores can improve time-series forecasting as exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we test SARIMAX models with and without log(x+1) transformations and VLAI-derived severity inputs. Although these adjustments provide limited improvements, SARIMAX remains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such as Poisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, including exponential decay functions for short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.
Original Article
View Cached Full Text

Cached at: 04/21/26, 11:27 AM

Paper page - Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

Source: https://huggingface.co/papers/2604.16038 Published on Apr 17

·

Submitted byhttps://huggingface.co/cedricbonhomme

Cédricon Apr 21

Abstract

Forecasting vulnerability-related activities using time-series models reveals challenges with sparse, bursty data, favoring count-based methods like Poisson regression for more stable predictions.

Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whethervulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work onVLAI, atransformer-based modelthat predicts vulnerability severity from textual descriptions, we examine whetherseverity scorescan improvetime-series forecastingas exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we testSARIMAXmodels with and without log(x+1) transformations andVLAI-derived severity inputs. Although these adjustments provide limited improvements,SARIMAXremains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such asPoisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, includingexponential decay functionsfor short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.

View arXiv pageView PDFProject pageGitHub2Add to collection

Get this paper in your agent:

hf papers read 2604\.16038

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.16038 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.16038 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.16038 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Stationarity-Aware Retrieval-Augmented Time Series Forecasting

arXiv cs.LG

SARAF is a Stationarity-Aware Retrieval-Augmented Forecasting framework that adaptively balances relevance and diversity in retrieval for time series forecasting, modulating diversification strength based on dataset-level stationarity to handle non-stationary regime shifts. Accepted to KDD 2026, it demonstrates competitive performance over strong baselines on eight real-world datasets.

TS-Fault: Benchmarking Time Series Forecasters Against Structural Faults

arXiv cs.LG

This paper introduces TS-Fault, a benchmark for evaluating time series forecasting models under structured fault scenarios like broken dependencies and regime changes, finding that clean-data accuracy often anti-correlates with robustness and that foundation models are especially fragile.

Nested Spatio-Temporal Time Series Forecasting

arXiv cs.LG

This paper proposes a nested spatiotemporal forecasting framework that uses spectral clustering to construct semantically coherent macro-level regions, which provide top-down guidance for fine-grained micro-level predictions. Experiments on high-dimensional datasets show consistent improvements over state-of-the-art baselines.