A Rolling-Window Framework for Churn Prediction and Behavioral Driver Identification
Summary
This paper proposes a rolling-window framework for customer churn prediction in non-contractual service environments, using 30-day behavioral windows to enable continuous risk assessment. Evaluated on real-world data, the feature-based model achieves 87.6% accuracy and 0.94 ROC-AUC, while the sequence-based model reaches 96.1% recall.
View Cached Full Text
Cached at: 06/08/26, 09:18 AM
# A Rolling-Window Framework for Churn Prediction and Behavioral Driver Identification
Source: [https://arxiv.org/html/2606.06776](https://arxiv.org/html/2606.06776)
###### Abstract
Customer churn prediction is a central task in customer analytics, particularly in non\-contractual, pay\-per\-use service environments where disengagement is not explicitly observed and must be inferred from behavioral inactivity\. Existing churn prediction approaches often rely on simplified temporal assumptions or single\-point representations of customer behavior, which limit their ability to support continuous risk assessment, interpretability, and realistic deployment over time\. This study proposes a temporally explicit churn prediction framework that models customer behavior usingrolling behavioral windows, enabling repeated and instance\-level churn risk estimationas customer activity evolves\. Customer behavior is summarized within a fixed 30\-days observation window, followed by a 30\-days future churn evaluation window, ensuring a clear temporal separation between behavioral evidence and churn outcomes\. The framework integrates feature\-based and sequence\-based learning approaches within a unified temporal design\. The proposed approach is evaluated on a large\-scale, real\-world dataset from a non\-contractual service platform\. Empirical results demonstrates trong and stable predictive performance, with accuracy reaching 87\.6% and ROC\-AUC of 0\.94 for the feature\-based model, while the sequence\-based model achieves recall as high as 96\.1% by capturing temporal disengagement patterns\. Evaluation on future unseen data confirms meaningful robustness under temporal shift, with accuracy remaining above 83% and ROC\-AUC exceeding 0\.91 without model retraining\. Overall, the findings highlight that carefully designed temporal framing, rather than model complexity alone, is critical for achieving robust, interpretable, and deployment\-ready churn prediction, providing a practical foundation for churn\-oriented decision support in dynamic service environments\.
###### keywords:
Keywords:Churn Prediction, On\-demand Services, Rolling\-Window Modeling, Interpretable Machine Learning, Decision Support Systems
††journal:Decision Support Systems\\affiliation
\[1\]organization=Information and Computer Science Department, King Fahd University of Petroleum and Minerals, city=Dhahran, postcode=31261, country=Saudi Arabia
\\affiliation
\[2\]organization=Interdisciplinary Research Center for Smart Mobility and Logistics \(IRC\-SML\), King Fahd University of Petroleum and Minerals, city=Dhahran, postcode=31261, country=Saudi Arabia
\\affiliation
\[3\]organization=SDAIA–KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals, city=Dhahran, postcode=31261, country=Saudi Arabia
## 1Introduction
Customer churn represents a persistent challenge across service\-oriented and usage\-based industries, where retaining existing customers is often more cost\-effective than acquiring new ones\. As competitive pressures increase and customer switching costs decline, organizations increasingly rely on predictive analytics to anticipate customer disengagement and to support retention\-oriented strategies\. In this context, customer churn prediction models are commonly used to estimate the likelihood that a customer will cease active usage of a service based on historical behavioral patterns, transactional activity, and usage signals\[De Caignyet al\.,[2018](https://arxiv.org/html/2606.06776#bib.bib2), Verbekeet al\.,[2012](https://arxiv.org/html/2606.06776#bib.bib1)\]\. Accurate churn prediction enables organizations to prioritize at\-risk customers and to allocate limited retention resources more effectively, making churn modeling a central task in contemporary customer analytics\[Neslinet al\.,[2006](https://arxiv.org/html/2606.06776#bib.bib3), Ascarzaet al\.,[2018](https://arxiv.org/html/2606.06776#bib.bib4)\]\.
In applied churn analytics, predictive accuracy alone is insufficient if model outputs cannot be meaningfully interpreted\. For churn prediction to be practically useful, analysts must be able to identify the behavioral and transactional factors that contribute to attrition risk, as such insights support diagnosis and subsequent intervention\[Coussementet al\.,[2017](https://arxiv.org/html/2606.06776#bib.bib5)\]\. In parallel, prior research has shown that the evaluation of churn models should extend beyond predictive accuracy to reflect their operational usefulness, particularly in contexts where model outputs inform retention actions\[Verbrakenet al\.,[2013](https://arxiv.org/html/2606.06776#bib.bib6)\]\. Consequently, churn prediction models are increasingly expected to balance predictive performance with explanatory capability, as failing to satisfy either objective can limit their practical relevance\.
Despite the extensive body of research on customer churn prediction, several methodological limitations continue to constrain the development of decision\-oriented and deployment\-ready churn prediction systems\. A large proportion of existing studies formulate churn prediction as a static classification task, in which each customer is represented by a single data instance and assigned a single churn label, even when rich transactional or usage data are available\[De Caignyet al\.,[2020](https://arxiv.org/html/2606.06776#bib.bib7), Lalwaniet al\.,[2022](https://arxiv.org/html/2606.06776#bib.bib8), Geileret al\.,[2022](https://arxiv.org/html/2606.06776#bib.bib9), De Caignyet al\.,[2024](https://arxiv.org/html/2606.06776#bib.bib10)\]\. Although recent studies have incorporated temporal information through sequence learning, panel data, or sliding\-window designs, these approaches often operate at fixed or sequence\-level representations and do not support continuous, rolling churn risk reassessment for the same customer over time\[Menaet al\.,[2024](https://arxiv.org/html/2606.06776#bib.bib11), Ahlstrandet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib12), Bugajevet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib13)\]\. Furthermore, many studies lack an explicit and consistent temporal problem formulation, with observation windows and churn horizons either undefined or implicitly assumed, thereby limiting interpretability and real\-world applicability\[Voet al\.,[2021](https://arxiv.org/html/2606.06776#bib.bib14), Krishnaet al\.,[2024](https://arxiv.org/html/2606.06776#bib.bib15), Chajia and Nfaoui,[2024](https://arxiv.org/html/2606.06776#bib.bib16)\]\. Even when raw behavioral data are available, they are frequently aggregated into static or coarse\-grained summaries, such as lifetime RFM features or quarterly usage statistics, which can obscure short\-term behavioral dynamics that may precede churn\[Sanchez Ramirezet al\.,[2024](https://arxiv.org/html/2606.06776#bib.bib18), Asfeet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib19)\]\. Finally, the majority of churn prediction frameworks are designed for contractual business models with explicit churn events, while comparatively fewer studies address non\-contractual, pay\-per\-use service contexts in which churn must be inferred from sustained inactivity patterns\[Zaghloulet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib20), Bugajevet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib13)\]\. Collectively, these limitations highlight the need for churn prediction frameworks that explicitly define temporal windows, preserve fine\-grained behavioral dynamics, and support rolling, instance\-level churn risk assessment in non\-contractual service settings\.
To address these gaps, this study proposes a temporally explicit churn prediction framework designed for non\-contractual, pay\-per\-use service environments\. The proposed approach models customer behavior using rolling behavioral windows that advance over time, enabling repeated churn risk assessment for the same customer as new activity data becomes available\. Customer engagement is summarized within explicitly defined observation windows, while churn is operationalized through a subsequent evaluation window, ensuring a clear temporal separation between behavioral evidence and churn outcomes\. By constructing multiple time\-indexed instances per customer, the framework captures short\-term behavioral dynamics that are often obscured by static or lifetime aggregation\. The approach integrates feature\-based and sequence\-based learning techniques within a unified temporal design and incorporates interpretable model outputs to support decision\-making in proactive customer retention scenarios\.
The remainder of this paper is organized as follows\. Section[2](https://arxiv.org/html/2606.06776#S2)summarizes existing research on customer churn prediction, with an emphasis on temporal modeling approaches and decision\-support considerations\. Section[3](https://arxiv.org/html/2606.06776#S3)describes the proposed methodology, including the temporal windowing strategy and modeling framework\. Section[4](https://arxiv.org/html/2606.06776#S4)presents the empirical results\. Section[5](https://arxiv.org/html/2606.06776#S5)discusses the findings in the context of decision support and deployment considerations\. Finally, Section[6](https://arxiv.org/html/2606.06776#S6)concludes the paper and outlines directions for future research\.
## 2Related Work
Customer churn prediction has been extensively studied across a wide range of service and subscription\-based domains, with the primary objective of identifying customers who are likely to discontinue their relationship with a firm\. Early and widely adopted approaches frame churn prediction as a static binary classification task, where each customer is represented by a single feature vector summarizing historical behavior\. Such formulations are commonly used in benchmark\-driven studies that compare traditional machine learning classifiers, including logistic regression, random forests, gradient boosting, and ensemble methods, often reporting strong predictive performance on cross\-sectional datasets\[Lalwaniet al\.,[2022](https://arxiv.org/html/2606.06776#bib.bib8), Geileret al\.,[2022](https://arxiv.org/html/2606.06776#bib.bib9), Krishnaet al\.,[2024](https://arxiv.org/html/2606.06776#bib.bib15)\]\. These studies demonstrate that supervised learning models can effectively discriminate between churners and non\-churners when churn is treated as a single\-point outcome, but they typically rely on static representations that abstract away temporal dynamics\.
To enhance predictive performance and feature representation, several studies incorporate richer data sources and advanced modeling techniques\. Unstructured data, such as textual customer interactions and call logs, have been integrated alongside structured behavioral features, yielding measurable gains in predictive accuracy\[De Caignyet al\.,[2020](https://arxiv.org/html/2606.06776#bib.bib7), Voet al\.,[2021](https://arxiv.org/html/2606.06776#bib.bib14)\]\. More recent work has explored representation learning through embeddings and deep learning architectures, including neural networks, meta\-modeling frameworks, and large language model embeddings, intending to capture complex nonlinear relationships in customer behavior\[Chajia and Nfaoui,[2024](https://arxiv.org/html/2606.06776#bib.bib16), Asfeet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib19)\]\. Hybrid and ensemble\-based approaches further combine multiple modeling paradigms to balance predictive performance and interpretability, demonstrating consistent improvements over single\-model baselines across multiple datasets\[De Caignyet al\.,[2024](https://arxiv.org/html/2606.06776#bib.bib10), Zaghloulet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib20)\]\.
Beyond static modeling, an increasing body of research acknowledges the temporal nature of customer behavior and seeks to incorporate time into churn prediction\. One stream of work adopts panel\-based or sequence\-aware formulations, where customer behavior is represented through time\-varying features or ordered sequences\. For example, time\-varying RFM measures and recurrent neural networks have been shown to improve churn prediction by capturing longitudinal patterns in customer engagement\[Menaet al\.,[2024](https://arxiv.org/html/2606.06776#bib.bib11)\]\. Other studies employ sliding or window\-based designs to summarize recent behavioral history, particularly in settings where churn is inferred from inactivity rather than explicit cancellation events\[Bugajevet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib13), Ahlstrandet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib12)\]\. These approaches demonstrate the value of temporal information, yet they differ substantially in how observation windows, prediction horizons, and instance generation are defined\.
Despite these advances, the literature exhibits notable inconsistencies in how the churn prediction task is temporally formulated\. In many studies, the length of the observation window and the definition of the churn horizon are either implicit or loosely specified, making it difficult to compare results across studies or to interpret model outputs in an operational context\[Voet al\.,[2021](https://arxiv.org/html/2606.06776#bib.bib14), Lalwaniet al\.,[2022](https://arxiv.org/html/2606.06776#bib.bib8)\]\. Even when inactivity\-based churn definitions are used, churn is often inferred retrospectively without a clearly articulated future prediction window, particularly in non\-contractual or usage\-based service settings\[Zaghloulet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib20), Asfeet al\.,[2025](https://arxiv.org/html/2606.06776#bib.bib19)\]\. As a result, while prior work provides strong evidence that both advanced modeling techniques and temporal information can enhance churn prediction, there remains limited consensus on how to formulate churn prediction as a temporally explicit and repeatedly evaluated task that aligns with evolving customer behavior\.
## 3Methodology
This section presents the proposed churn prediction framework, which is designed to model short\-term behavioral dynamics in non\-contractual, pay\-per\-use service environments\. The methodology combines a rolling\-window instance construction strategy with feature\-based and sequence\-based learning approaches, enabling repeated churn risk assessment for the same customer over time\. Figure[1](https://arxiv.org/html/2606.06776#S3.F1)summarizes the end\-to\-end workflow\. The overall workflow consists of data preprocessing, temporal windowing, feature engineering, model development, evaluation, and interpretability analysis\.
Fig\. 1:Overview of the proposed churn prediction methodology\. Raw event\-level booking data are preprocessed and organized into rolling behavioral observation windows followed by a future churn evaluation horizon, ensuring temporal separation between inputs and outcomes\. The resulting representations are used for feature\-based and sequence\-based modeling, followed by time\-aware evaluation and interpretability analysis\.### 3\.1Problem Definition
Leti∈\{1,…,N\}i\\in\\\{1,\\dots,N\\\}denote a customer and letttrepresent a discrete time index measured in days\. Customer behavior is observed through a sequence of time\-stamped service interactions\. The objective is to predict whether a customer will churn within a predefined future period based on their recent behavioral history\.
Churn prediction is formulated as a binary classification task at the instance level\. For each customeriiand reference timett, an input instance is constructed using a behavioral observation window of lengthWbW\_\{b\}, followed by a churn evaluation window of lengthWcW\_\{c\}\. The churn labelyi,ty\_\{i,t\}is defined as:
yi,t=\{1,if customeriexhibits noqualifying activity during theinterval\[t\+1,t\+Wc\],0,otherwise\.y\_\{i,t\}=\\begin\{cases\}1,&\\begin\{aligned\} &\\text\{if customer \}i\\text\{ exhibits no\}\\\\ &\\text\{qualifying activity during the\}\\\\ &\\text\{interval \}\[t\+1,\\,t\+W\_\{c\}\],\\end\{aligned\}\\\\\[4\.0pt\] 0,&\\text\{otherwise\}\.\\end\{cases\}\(1\)
This formulation ensures a clear temporal separation between behavioral evidence and churn outcomes and aligns the prediction task with prospective churn risk assessment\.
### 3\.2Dataset Description and Preparation
The empirical evaluation is conducted using a real\-world dataset obtained from a commercial on\-demand car wash service operating under a non\-contractual, pay\-per\-use model\. The dataset consists of time\-stamped service booking records collected over a continuous four\-month period from October 2024 to January 2025\. Each record corresponds to an individual service interaction and includes operational, transactional, temporal, and spatial attributes\.
In its raw form, the dataset comprises 401,164 event\-level booking records generated by 101,765 unique customers\. User engagement is highly heterogeneous and temporally sparse, with a median of two bookings and one active day per user\. More than half of customers \(57\.4%\) generate two or fewer bookings, and approximately 60% exhibit activity spanning no more than seven days, reflecting short\-lived and bursty usage patterns typical of non\-contractual service environments\. These characteristics motivate the use of an inactivity\-based churn formulation with a fixed future evaluation window to ensure fair churn labeling for sparsely active users\.
Due to the highly granular and irregular nature of the raw interaction logs, event\-level data are not used directly for model training\. Instead, customer activity is summarized through a rolling\-window instance construction procedure \(described in Section[3\.3](https://arxiv.org/html/2606.06776#S3.SS3)\), which transforms booking events into a time\-indexed panel of behavioral instances and allows each customer to contribute multiple observation windows over time\.
Prior to window construction, the raw booking data undergo a structured preparation process\. Preprocessing includes correction of inconsistent data types, normalization of numerical attributes, and encoding of categorical variables such as service type, booking status, and payment method, ensuring consistency and comparability across behavioral windows\. Missing values are predominantly structural and arise mainly in monetary and duration\-related attributes associated with unsuccessful bookings\. As these attributes are meaningful only for completed services, they are computed exclusively from successful bookings \(status = 2\)\. For observation windows without successful activity, a dedicated binary indicator captures the absence of completed bookings, and all success\-dependent attributes are set to zero, preserving the semantic distinction between inactivity and missing data\.
Spatial attributes are derived by computing geographic distances between customers and service providers using the Haversine formula and discretizing these distances into ordinal categories based on empirical quantiles to capture spatial engagement patterns while maintaining robustness to outliers\. All preprocessing steps are applied consistently across training, validation, and test datasets to prevent information leakage\.
After applying rolling\-window aggregation and churn labeling, the event\-level data are transformed into 280,756 behavioral instances\. Each instance represents a customer summarized over a single observation window and is described by a fixed\-length feature vector of 40 engineered attributes\. The resulting instance\-level dataset exhibits a moderate class imbalance, with approximately 60\.5% non\-churn instances and 39\.5% churn instances, reflecting realistic disengagement dynamics in non\-contractual, on\-demand service settings\. Although the data originate from a car wash service context, both the dataset characteristics and the inactivity\-based churn formulation are representative of a broader class of booking\-based service platforms\.
### 3\.3Rolling Window Instance Construction
To capture evolving customer behavior, a rolling\-window strategy is employed\. For each customerii, a behavioral observation window of lengthWb=30W\_\{b\}=30days is constructed, summarizing activity over the interval\[t−Wb\+1,t\]\[t\-W\_\{b\}\+1,t\]\. This window is followed by a churn evaluation window of lengthWc=30W\_\{c\}=30days, spanning\[t\+1,t\+Wc\]\[t\+1,t\+W\_\{c\}\]\. As illustrated in Figure[2](https://arxiv.org/html/2606.06776#S3.F2), windows advance with a daily stride, enabling continuous reassessment of churn risk as new behavioral data become available\.
Fig\. 2:Rolling\-window formulationAlthough rolling windows advance with a daily stride, customers are not treated as independent or new entities across windows\. Instead, each window represents a distinct temporal state of the same customer\. To avoid redundant instances, consecutive windows that yield identical behavioral summaries are collapsed, and a new instance is generated only when a behavioral change occurs, operationalized by the arrival of a new booking event within the observation window\. Importantly, churn labels are assigned exclusively based on future activity, ensuring that each instance is given an equal and fair opportunity to exhibit sustained inactivity before being labeled as churn and that no information from the evaluation window leaks into feature construction\.
### 3\.4Feature Engineering
From each behavioral observation window, a comprehensive set of behavioral and operational features is constructed to characterize customer engagement, service usage, and short\-term behavioral evolution\. The feature design follows a structured taxonomy to ensure interpretability, reproducibility, and alignment with the temporal churn formulation\.
The first group of features captures*booking activity and outcomes*, summarizing the volume of user interactions and the distribution of booking results within the observation window\. These features reflect overall engagement intensity as well as the relative prevalence of successful, cancelled, and failed bookings, thereby providing insight into service reliability and user commitment\.
The second group represents*monetary behavior*, describing spending patterns and
Table 1:Categorization of engineered behavioral and operational features extracted from each observation window\.CategoryFeaturesBooking Activitytotal\_bookings, completed\_bookings, cancelled\_bookings, failed\_bookings, completion\_rate, cancellation\_rate, failure\_rate, only\_one\_booking, active\_days, last\_booking\_statusMonetary Behavioravg\_paid, avg\_total, avg\_discount, avg\_wallet\_usage, avg\_wash\_duration, has\_successful\_bookingSpatial & Service Characteristicspct\_dist\_very\_near, pct\_dist\_near, pct\_dist\_mid, pct\_dist\_mid\_far, pct\_dist\_far, services\_type\_most\_frequentTemporal Context & Recencydays\_since\_last\_booking, avg\_booking\_hour, weekend\_success\_rate, first\_booking\_month, first\_booking\_day, last\_booking\_month, last\_booking\_dayCategorical Preferencespayment\_type\_most\_frequent, promocodes\_type\_most\_frequent, discount\_type\_most\_frequent, sources\_most\_frequent, is\_cash\_on\_delivery\_most\_frequent, use\_wallet\_most\_frequent, booking\_type\_most\_frequentTrend\-Based Indicatorstrend\_total\_bookings, trend\_avg\_paid, trend\_avg\_wallet\_usage, trend\_completion\_ratepayment\-related characteristics derived exclusively from successful bookings\. These features capture average payment amounts, discounts, wallet usage, and service duration, enabling the model to distinguish between high\- and low\-value engagement patterns\.
A third category focuses on*spatial behavior and service characteristics*\. These features summarize the geographic dispersion of successful bookings across distance\-based buckets and capture dominant service\-type preferences, thereby reflecting accessibility, convenience, and service heterogeneity\.
The fourth group encodes*temporal context and recency*, including indicators of activity recency, temporal usage patterns, and calendar\-related attributes\. These features describe when customers interact with the service and how recently engagement has occurred, which are critical signals in non\-contractual churn settings\.
The fifth category consists of*categorical preference indicators*, defined as the most frequently observed values of selected categorical attributes within the observation window\. These features capture stable behavioral preferences related to payment methods, promotions, booking sources, and service configurations\.
Finally,*trend\-based features*are computed to capture short\-term behavioral dynamics within the observation window\. These features quantify directional changes in engagement, spending, and success rates by estimating linear trends over time\-ordered observations, thereby enabling early detection of disengagement patterns that may precede churn\.
All success\-dependent attributes are computed exclusively from completed bookings\. For observation windows with no successful activity, a dedicated indicator variable captures the absence of successful engagement, and corresponding success\-dependent attributes are set to zero to preserve the semantic distinction between inactivity and missing data\. Prior to model training, all numerical features are standardized\. Table[1](https://arxiv.org/html/2606.06776#S3.T1)provides a complete overview of the engineered features grouped by category\.
### 3\.5Modeling Approaches
Two complementary modeling paradigms are employed to evaluate the effectiveness of aggregated versus sequential representations of customer behavior\.
#### 3\.5\.1Feature\-Based Modeling
A gradient boosting decision tree model is employed to learn churn patterns from aggregated feature vectors derived from each behavioral observation window\. Gradient boosting is particularly well suited for structured and tabular data, as it effectively captures nonlinear relationships and complex feature interactions while maintaining strong generalization capabilityChen and Guestrin \[[2016](https://arxiv.org/html/2606.06776#bib.bib21)\]\. The model outputs probabilistic churn scores by optimizing a binary logistic objective, making it suitable for risk\-based churn prediction in decision support settings\.
By operating on window\-level behavioral summaries, the feature\-based model serves as a strong and interpretable baseline for assessing the predictive value of short\-term behavioral aggregation\. Its ability to model heterogeneous engagement signals and handle mixed feature types provides a robust point of comparison against sequence\-based approaches that explicitly model temporal dependencies\.
#### 3\.5\.2Sequence\-Based Modeling
To explicitly model temporal dependencies in customer behavior, a Long Short\-Term Memory \(LSTM\) network is employed on sequences of behavioral observation windows\. LSTM networks are designed to address the vanishing gradient problem inherent in recurrent neural networks and are well suited for capturing long\-range dependencies in sequential dataHochreiter and Schmidhuber \[[1997](https://arxiv.org/html/2606.06776#bib.bib23)\]\. Each input sequence represents an ordered series of window\-level behavioral summaries corresponding to a customer’s recent activity history\.
The sequence\-based model complements the feature\-based approach by learning temporal patterns that may not be fully captured by static aggregation alone, such as gradual disengagement or evolving usage trends\. By leveraging sequential representations of customer behavior, the LSTM enables a comparative evaluation of feature\-based versus sequence\-based learning under the same rolling\-window formulation\.
### 3\.6Model Configuration
Table[2](https://arxiv.org/html/2606.06776#S3.T2)summarizes the final configurations used for the feature\-based and sequence\-based models\. For XGBoost, hyperparameters controlling tree complexity, learning rate, and subsampling were selected through grid\-based search on the validation data to balance predictive performance and generalization\. Subsampling and column sampling were employed to reduce overfitting in the high\-dimensional behavioral feature space\.
For the sequence\-based model, the LSTM architecture and training parameters were selected using validation performance as the primary criterion\. Early stopping based on validation loss was applied to prevent overfitting and to ensure stable convergence\. These configurations were fixed after model development and used consistently across both test evaluation and temporal generalizability experiments\.
Table 2:Final model configurations used in the experimentsModelParameterValueXGBoostNumber of estimators400Maximum tree depth8Learning rate0\.05Subsample ratio0\.8Column subsample by tree0\.8Objective functionBinary logisticLSTMNumber of LSTM layers2Hidden units per layer32Sequence length \(time steps\)10Padding strategyLeft padding with training\-feature mediansMasking strategyPacked sequences using true lengthsDropout rate0\.4Output activationSigmoidLoss functionBinary cross\-entropyOptimizerAdamLearning rate5×10−45\\times 10^\{\-4\}Weight decay \(L2\)1×10−41\\times 10^\{\-4\}Batch size32Maximum epochs100Early stopping patience7
### 3\.7Evaluation Protocol
Model performance is evaluated using Accuracy, Precision, Recall, F1\-score, and the Area Under the Receiver Operating Characteristic Curve \(ROC\-AUC\)\. ROC\-AUC is particularly suitable for churn prediction tasks as it provides a threshold\-independent measure of discriminative performance and is robust to class imbalanceFawcett \[[2006](https://arxiv.org/html/2606.06776#bib.bib26)\]\.
To reflect realistic deployment conditions, model development is conducted exclusively on historical data collected during 2024 using a time\-aware data split rather than random sampling\. Time\-aware evaluation has been shown to provide more reliable performance estimates for predictive models applied to temporal and behavioral data, as random splits can introduce information leakage and lead to overly optimistic resultsBergmeiret al\.\[[2018](https://arxiv.org/html/2606.06776#bib.bib27)\]\. Models are trained, validated, and tested on the 2024 data, after which the final trained models are fixed and saved\.
To assess temporal generalizability under realistic deployment conditions, the pre\-trained models are subsequently applied to a temporally held\-out dataset representing future customer behavior from June\-July 2025\. This future dataset is processed using the same preprocessing, rolling\-window construction, feature engineering, and churn labeling pipeline as the training data, and no retraining or parameter updates are performed\. This out\-of\-time evaluation provides an explicit assessment of model robustness to temporal shifts in customer behavior and distributional change\.
### 3\.8Churn Factor Identification Methods
To support interpretation of model predictions, both global and local explainability techniques are applied\. Gain\-based feature importance is used to assess the relative contribution of predictors in the feature\-based model based on their influence on ensemble split decisionsBreiman \[[2001](https://arxiv.org/html/2606.06776#bib.bib28)\]\. In addition, Shapley Additive Explanations \(SHAP\) are employed to provide instance\-level feature attributions for individual predictionsLundberg and Lee \[[2017](https://arxiv.org/html/2606.06776#bib.bib29)\]\.
## 4Results
This section presents the empirical results of the proposed churn prediction framework\. Model performance is evaluated using the metrics and evaluation protocol defined in Section[3\.7](https://arxiv.org/html/2606.06776#S3.SS7), including both rolling\-window test data and a temporally held\-out dataset representing future customer bookings\. Results are reported for the feature\-based XGBoost model and the sequence\-based LSTM model, followed by an assessment of their temporal generalizability and an analysis of the factors contributing to churn predictions\.
Table 3:Predictive performance of XGBoost and LSTM models on rolling\-window dataModelDatasetAccuracyPrecisionRecallF1\-scoreROC\-AUCXGBoostTraining89\.5%90\.0%90\.0%90\.0%–XGBoostTest87\.6%88\.0%88\.0%88\.0%0\.941LSTMTraining90\.67%––––LSTMValidation90\.16%––––LSTMTest90\.1%–96\.1%–0\.940

Fig\. 3:ROC curves for \(a\) XGBoost and \(b\) LSTM models on rolling\-window test data\.

Fig\. 4:Confusion matrices for \(a\) XGBoost and \(b\) LSTM models evaluated on the rolling\-window test dataset\.#### 4\.0\.1XGBoost Performance
Table[3](https://arxiv.org/html/2606.06776#S4.T3)reports the predictive performance of the XGBoost model on the rolling\-window data\. The model demonstrates strong and balanced classification behavior across both training and test sets, with consistent precision, recall, and F1\-score values\. This indicates that the feature\-based approach effectively captures short\-term behavioral patterns without pronounced overfitting\.
The discriminative capability of the model is further illustrated by the ROC curve in Fig\.[3](https://arxiv.org/html/2606.06776#S4.F3)\(a\)\. The corresponding confusion matrix shown in Fig\.[4](https://arxiv.org/html/2606.06776#S4.F4)\(a\) indicates high correct classification rates for both churn and non\-churn users, reflecting a balanced trade\-off between sensitivity and specificity\.
#### 4\.0\.2LSTM Performance
The performance of the LSTM model on the rolling\-window data is summarized in Table[3](https://arxiv.org/html/2606.06776#S4.T3)\. Compared to XGBoost, the LSTM achieves higher overall test accuracy and exhibits notably stronger sensitivity toward churn users, as reflected by its recall\. This behavior suggests that the sequence\-based model is particularly effective at capturing temporal disengagement patterns\.
Fig\.[3](https://arxiv.org/html/2606.06776#S4.F3)\(b\) illustrates the ROC curve for the LSTM model, confirming competitive discriminative performance\. The confusion matrix in Fig\.[4](https://arxiv.org/html/2606.06776#S4.F4)\(b\) shows a higher correct detection rate for churn users, accompanied by an increased rate of false positives for non\-churn users\. The per\-epoch training and validation accuracy trends shown in Fig\.[5](https://arxiv.org/html/2606.06776#S4.F5), together with the corresponding per\-epoch training and validation loss trajectories presented in Fig\.[6](https://arxiv.org/html/2606.06776#S4.F6), indicate stable convergence behavior and the absence of severe overfitting during training\.
Fig\. 5:Training and validation accuracy of the LSTM model across epochs\.Fig\. 6:Training and validation loss of the LSTM model across epochs\.
### 4\.1Generalizability to Future Booking Behavior
To evaluate temporal robustness under realistic deployment conditions, the trained models are applied to a temporally held\-out dataset representing future customer behavior from June–July 2025\. This dataset is processed using the same preprocessing, rolling\-window construction, feature engineering, and churn labeling pipeline as the historical data, and the models are evaluated without retraining\. The resulting generalizability performance is summarized in Table[4](https://arxiv.org/html/2606.06776#S4.T4)\.
The XGBoost model maintains an accuracy of 83\.14% and a ROC\-AUC of 0\.911 on the future dataset, indicating strong temporal stability under distributional shift\. The LSTM model achieves an accuracy of 81\.18% and exhibits a high recall of 91\.80% for churn users, although its ROC\-AUC declines to 0\.830\. These results indicate that while both models generalize meaningfully to future booking behavior, the feature\-based model demonstrates greater overall stability, whereas the sequence\-based model preserves stronger sensitivity to churn in an out\-of\-time evaluation setting\.
Table 4:Generalizability performance of XGBoost and LSTM models on unseen June 2025 dataModelAccuracyPrecisionRecallF1\-scoreROC\-AUCXGBoost83\.14%83\.29%76\.13%79\.55%0\.911LSTM81\.18%83\.91%91\.80%87\.68%0\.830Table 5:Top factors contributing to churn based on XGBoost gain importance \(cumulative threshold = 0\.95\)FeatureGainCumulative Gaintrend\_total\_bookings48\.260\.123weekend\_success\_rate35\.380\.214total\_bookings27\.510\.284days\_since\_last\_booking23\.460\.411payment\_type\_most\_frequent21\.160\.465completed\_bookings15\.860\.505only\_one\_booking12\.680\.538last\_booking\_month10\.100\.564trend\_completion\_rate9\.760\.589avg\_wallet\_usage9\.480\.613first\_booking\_month8\.930\.636

Fig\. 7:SHAP\-based interpretation of the XGBoost model: \(a\) Mean absolute SHAP values indicating global feature importance, and \(b\) SHAP beeswarm plot illustrating the distribution and direction of feature effects on churn prediction\.
### 4\.2Factors Contributing to Customer Churn
To identify the behavioral drivers underlying churn predictions, gain\-based feature importance and SHAP analyses were conducted for the XGBoost model\. The top contributing features based on cumulative gain importance are reported in Table[5](https://arxiv.org/html/2606.06776#S4.T5)\. Features related to short\-term engagement dynamics, includingtrend total bookings,weekend success rate,total bookings, anddays since last booking, emerged as the dominant global predictors\.
SHAP\-based explanations provide further insight into the directionality and consistency of these effects\. The SHAP mean absolute importance values and beeswarm plots, shown in Figure[7](https://arxiv.org/html/2606.06776#S4.F7), reveal that declining booking trends, prolonged inactivity, and sparse engagement significantly increase churn probability\. Conversely, recent consistent activity and higher booking success rates are associated with reduced churn risk\. The alignment between gain\-based and SHAP\-based explanations confirms the robustness and interpretability of the identified churn drivers\.
## 5Discussion
This study set out to examine how short\-term behavioral windowing and temporal evaluation influence churn prediction performance in non\-contractual, pay\-per\-use service environments\. The results demonstrate that explicitly modeling recent customer behavior through rolling windows enables both accurate and operationally meaningful churn prediction, while preserving interpretability and temporal robustness\.
### 5\.1Effectiveness of Sliding Behavioral Windows
A key contribution of this work lies in the adoption of a daily sliding\-window formulation, where repeated behavioral instances are generated for the same customer using a fixed observation horizon followed by an explicit churn window\. Unlike static snapshot approaches that summarize a customer’s entire history into a single record, the proposed framework captures evolving engagement patterns and supports continuous churn risk assessment\.
Compared to prior studies that employ fixed or coarse\-grained temporal aggregation, such as monthly or quarterly panelsMenaet al\.\[[2024](https://arxiv.org/html/2606.06776#bib.bib11)\], the finer\-grained sliding windows used in this study enable earlier detection of disengagement signals\. This design choice is particularly important in consumer\-facing on\-demand services contexts, where customer activity is highly volatile and engagement levels can change rapidly\. The empirical findings suggest that relatively short observation windows, when enriched with trend\-based and recency features, are sufficient to capture churn\-relevant dynamics without relying on long historical spans\.
### 5\.2Feature\-Based versus Sequence\-Based Learning under Rolling Windows
The comparative evaluation of XGBoost and LSTM models highlights the complementary strengths of feature\-based and sequence\-based learning within the same temporal framework\. The feature\-based XGBoost model demonstrates strong stability and balanced classification behavior across rolling\-window test data and future unseen bookings, suggesting that engineered behavioral summaries can effectively capture non\-linear interactions among short\-term engagement indicators\.
In contrast, the LSTM model exhibits higher sensitivity to churn users, reflecting its ability to model sequential dependencies across consecutive behavioral windows\. This finding aligns with prior work that emphasizes the value of sequence modeling for early churn detectionAhlstrandet al\.\[[2025](https://arxiv.org/html/2606.06776#bib.bib12)\]\. However, the observed trade\-off between recall and overall stability also underscores the importance of careful temporal validation when deploying deep learning models in real\-world settings\. Unlike several prior studies that employ advanced learning models on cross\-sectional or implicitly aggregated behavioral features without explicit temporal prediction horizonsVoet al\.\[[2021](https://arxiv.org/html/2606.06776#bib.bib14)\], Lalwaniet al\.\[[2022](https://arxiv.org/html/2606.06776#bib.bib8)\], the present work demonstrates that sequence models derive their primary benefit when paired with a principled windowing strategy\.
### 5\.3Temporal Generalizability and Deployment Robustness
An important aspect of this study is the evaluation of model performance on a temporally held\-out dataset representing future customer behavior\. Many existing churn prediction studies report strong in\-sample or random\-split performance but do not assess robustness under temporal shiftDe Caignyet al\.\[[2020](https://arxiv.org/html/2606.06776#bib.bib7)\], Geileret al\.\[[2022](https://arxiv.org/html/2606.06776#bib.bib9)\]\. By contrast, the results presented here show that both models maintain meaningful predictive capability when applied to future booking data processed through the same pipeline\.
The feature\-based XGBoost model exhibits greater temporal stability, while the LSTM model retains strong recall for churn users despite some degradation in overall discrimination\. These findings are consistent with observations reported in recent window\-based churn studies that explicitly account for temporal driftBugajevet al\.\[[2025](https://arxiv.org/html/2606.06776#bib.bib13)\]\. From a decision support perspective, this highlights an important trade\-off between stability and sensitivity that practitioners must consider when selecting models for deployment\.
### 5\.4Interpretability of Churn Drivers under Rolling Evaluation
Beyond predictive performance, the proposed framework emphasizes interpretability through gain\-based feature importance and SHAP analysis\. The convergence of both methods on a consistent set of dominant churn drivers, such as recency of activity, short\-term booking trends, and engagement density, reinforces the behavioral validity of the learned models\. Importantly, these explanations are derived from repeated behavioral instances rather than static customer profiles, enabling insight into how churn risk evolves over time\.
This temporal interpretability distinguishes the present work from prior explainable churn models that operate on cross\-sectional dataDe Caignyet al\.\[[2024](https://arxiv.org/html/2606.06776#bib.bib10)\]\. By linking feature importance to rolling windows, the framework supports actionable decision\-making, allowing service providers to identify not only who is at risk of churn, but also when and why that risk increases\.
### 5\.5Implications for Decision Support Systems
From a Decision Support Systems perspective, the findings suggest that effective churn prediction requires more than high predictive accuracy on static benchmarks\. The integration of sliding behavioral windows, temporal validation, and interpretable modeling enables continuous risk monitoring and supports proactive intervention strategies\. Compared to more complex architectures such as transformersAhlstrandet al\.\[[2025](https://arxiv.org/html/2606.06776#bib.bib12)\], the proposed framework demonstrates that competitive performance can be achieved with relatively lightweight models when temporal framing is carefully designed\.
Overall, this study provides evidence that short\-term behavioral modeling, combined with rolling evaluation and explainable analytics, offers a practical and robust foundation for churn\-oriented decision support in mobility and other on\-demand service platforms\.
## 6Conclusion
This study proposes a temporally explicit churn prediction framework for non\-contractual, pay\-per\-use service environments, formulating churn prediction as a rolling, instance\-level task that enables repeated churn risk assessment as customer behavior evolves\. By explicitly defining behavioral observation windows and future churn evaluation horizons, the framework aligns churn modeling with realistic deployment conditions and overcomes limitations of static and cross\-sectional approaches\. Empirical results demonstrate strong and stable predictive performance, with the feature\-based XGBoost model exhibiting balanced behavior and greater robustness under temporal shift\. In contrast, the sequence\-based LSTM model shows higher sensitivity to churn users by capturing sequential disengagement patterns\. The integration of gain\-based feature importance and SHAP analysis further supports interpretable identification of behaviorally meaningful churn drivers, reinforcing the framework’s suitability for decision\-oriented use\. Overall, the findings highlight that carefully designed temporal framing, rather than model complexity alone, is central to achieving deployment\-ready and interpretable churn prediction; future work may explore adaptive windowing strategies, drift\-aware learning mechanisms, and extensions to cost\-sensitive evaluation and additional non\-contractual service domains\.
## References
- J\. Ahlstrand, A\. Borg, H\. Grahn, and M\. Boldt \(2025\)Using transformers for b2b contractual churn prediction based on multivariate time\-series data\.InProceedings of the International Conference on Enterprise Information Systems \(ICEIS\),Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p3.1),[§5\.2](https://arxiv.org/html/2606.06776#S5.SS2.p2.1),[§5\.5](https://arxiv.org/html/2606.06776#S5.SS5.p1.1)\.
- E\. Ascarza, S\. A\. Neslin, O\. Netzer, Z\. Anderson, P\. S\. Fader, S\. Gupta, B\. G\.S\. Hardie, A\. Lemmens, B\. Libai, D\. Neal, F\. Provost, and R\. Schrift \(2018\)In pursuit of enhanced customer retention management: review, key issues, and future directions\.Customer Needs and Solutions5\(1–2\),pp\. 65–81\.External Links:[Document](https://dx.doi.org/10.1007/s40547-017-0080-0)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p1.1)\.
- A\. Md\. Asfe, Md\. R\. Rahman, and Md\. S\. Hossain \(2025\)Integrating meta\-modeling and neural networks for customer churn prediction in e\-commerce\.Discover Applied Sciences7,pp\. 569\.External Links:[Document](https://dx.doi.org/10.1007/s44206-025-00569-4)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p2.1),[§2](https://arxiv.org/html/2606.06776#S2.p4.1)\.
- C\. Bergmeir, R\. J\. Hyndman, and J\. M\. Benítez \(2018\)Bagging exponential smoothing methods using stl decomposition and box–cox transformation\.International Journal of Forecasting34\(4\),pp\. 728–740\.External Links:[Document](https://dx.doi.org/10.1016/j.ijforecast.2018.05.002)Cited by:[§3\.7](https://arxiv.org/html/2606.06776#S3.SS7.p2.1)\.
- L\. Breiman \(2001\)Random forests\.Machine Learning45\(1\),pp\. 5–32\.External Links:[Document](https://dx.doi.org/10.1023/A%3A1010933404324)Cited by:[§3\.8](https://arxiv.org/html/2606.06776#S3.SS8.p1.1)\.
- A\. Bugajev, R\. Kriauzienė, and V\. Chadyšas \(2025\)Realistic data delays and alternative inactivity definitions in telecom churn: investigating concept drift using a sliding\-window approach\.Applied Sciences15\(3\),pp\. 1599\.External Links:[Document](https://dx.doi.org/10.3390/app15031599)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p3.1),[§5\.3](https://arxiv.org/html/2606.06776#S5.SS3.p2.1)\.
- A\. Chajia and E\. H\. Nfaoui \(2024\)Customer churn prediction approach based on llm embeddings and logistic regression\.Future Internet16\(12\)\.External Links:[Document](https://dx.doi.org/10.3390/fi16120453)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p2.1)\.
- T\. Chen and C\. Guestrin \(2016\)XGBoost: a scalable tree boosting system\.InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pp\. 785–794\.External Links:[Document](https://dx.doi.org/10.1145/2939672.2939785)Cited by:[§3\.5\.1](https://arxiv.org/html/2606.06776#S3.SS5.SSS1.p1.1)\.
- K\. Coussement, S\. Lessmann, and G\. Verstraeten \(2017\)A comparative analysis of data preparation algorithms for customer churn prediction: a case study in the telecommunication industry\.Decision Support Systems95,pp\. 27–36\.External Links:[Document](https://dx.doi.org/10.1016/j.dss.2016.11.007)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p2.1)\.
- A\. De Caigny, K\. Coussement, K\. W\. De Bock, and S\. Lessmann \(2020\)Incorporating textual information in customer churn prediction models based on structured and unstructured data\.International Journal of Forecasting36\(4\),pp\. 1560–1576\.External Links:[Document](https://dx.doi.org/10.1016/j.ijforecast.2019.11.001)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p2.1),[§5\.3](https://arxiv.org/html/2606.06776#S5.SS3.p1.1)\.
- A\. De Caigny, K\. Coussement, and K\. W\. De Bock \(2018\)A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees\.European Journal of Operational Research269\(2\),pp\. 760–772\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2018.02.009)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p1.1)\.
- A\. De Caigny, K\. Coussement, and K\. W\. De Bock \(2024\)Hybrid black\-box classification for customer churn prediction with segmented interpretability analysis\.Decision Support Systems181,pp\. 114217\.External Links:[Document](https://dx.doi.org/10.1016/j.dss.2024.114217)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p2.1),[§5\.4](https://arxiv.org/html/2606.06776#S5.SS4.p2.1)\.
- T\. Fawcett \(2006\)An introduction to roc analysis\.Pattern Recognition Letters27\(8\),pp\. 861–874\.External Links:[Document](https://dx.doi.org/10.1016/j.patrec.2005.10.010)Cited by:[§3\.7](https://arxiv.org/html/2606.06776#S3.SS7.p1.1)\.
- L\. Geiler, S\. Affeldt, and M\. Nadif \(2022\)An effective strategy for churn prediction and customer profiling\.Data & Knowledge Engineering142,pp\. 102100\.External Links:[Document](https://dx.doi.org/10.1016/j.datak.2022.102100)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p1.1),[§5\.3](https://arxiv.org/html/2606.06776#S5.SS3.p1.1)\.
- S\. Hochreiter and J\. Schmidhuber \(1997\)Long short\-term memory\.Neural Computation9\(8\),pp\. 1735–1780\.External Links:[Document](https://dx.doi.org/10.1162/neco.1997.9.8.1735)Cited by:[§3\.5\.2](https://arxiv.org/html/2606.06776#S3.SS5.SSS2.p1.1)\.
- A\. Krishna, R\. Patel, and S\. Mehta \(2024\)Application of machine learning techniques for churn prediction in the telecom business\.Results in Engineering18,pp\. 101254\.External Links:[Document](https://dx.doi.org/10.1016/j.rineng.2024.101254)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p1.1)\.
- P\. Lalwani, M\. K\. Mishra, J\. S\. Chadha, and P\. Sethi \(2022\)Customer churn prediction system: a machine learning approach\.Computing104\(2\),pp\. 271–294\.External Links:[Document](https://dx.doi.org/10.1007/s00607-021-00974-4)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p1.1),[§2](https://arxiv.org/html/2606.06776#S2.p4.1),[§5\.2](https://arxiv.org/html/2606.06776#S5.SS2.p2.1)\.
- S\. M\. Lundberg and S\. Lee \(2017\)A unified approach to interpreting model predictions\.InProceedings of the 31st International Conference on Neural Information Processing Systems \(NeurIPS\),pp\. 4765–4774\.Cited by:[§3\.8](https://arxiv.org/html/2606.06776#S3.SS8.p1.1)\.
- G\. Mena, K\. Coussement, K\. W\. De Bock, A\. De Caigny, and S\. Lessmann \(2024\)Exploiting time\-varying rfm measures for customer churn prediction with deep neural networks\.Annals of Operations Research339\(1\),pp\. 765–787\.External Links:[Document](https://dx.doi.org/10.1007/s10479-023-05259-9)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p3.1),[§5\.1](https://arxiv.org/html/2606.06776#S5.SS1.p2.1)\.
- S\. A\. Neslin, S\. Gupta, W\. Kamakura, J\. Lu, and C\. H\. Mason \(2006\)Defection detection: measuring and understanding the predictive accuracy of customer churn models\.Journal of Marketing Research43\(2\),pp\. 204–211\.External Links:[Document](https://dx.doi.org/10.1509/jmkr.43.2.204)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p1.1)\.
- J\. Sanchez Ramirez, K\. Coussement, A\. De Caigny, and D\. F\. Benoit \(2024\)Incorporating usage data for B2B churn prediction modeling\.Industrial Marketing Management120,pp\. 191–205\.External Links:[Document](https://dx.doi.org/10.1016/j.indmarman.2024.01.015)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1)\.
- W\. Verbeke, K\. Dejaeger, D\. Martens, J\. Hur, and B\. Baesens \(2012\)New insights into churn prediction in the telecommunication sector: a profit driven data mining approach\.European Journal of Operational Research218\(1\),pp\. 211–229\.External Links:[Document](https://dx.doi.org/10.1016/j.ejor.2011.09.031)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p1.1)\.
- T\. Verbraken, W\. Verbeke, and B\. Baesens \(2013\)A novel profit maximizing metric for measuring classification performance of customer churn prediction models\.IEEE Transactions on Knowledge and Data Engineering25\(5\),pp\. 961–973\.External Links:[Document](https://dx.doi.org/10.1109/TKDE.2012.50)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p2.1)\.
- N\. N\. Y\. Vo, S\. Liu, X\. Li, and G\. Xu \(2021\)Leveraging unstructured call log data for customer churn prediction\.Knowledge\-Based Systems212,pp\. 106586\.External Links:[Document](https://dx.doi.org/10.1016/j.knosys.2020.106586)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p2.1),[§2](https://arxiv.org/html/2606.06776#S2.p4.1),[§5\.2](https://arxiv.org/html/2606.06776#S5.SS2.p2.1)\.
- H\. Zaghloul, M\. A\. Hassan, and A\. El\-Sayed \(2025\)Enhancing customer retention in online retail through churn prediction: a hybrid rfm, clustering, and deep learning approach\.Expert Systems with Applications237,pp\. 121728\.External Links:[Document](https://dx.doi.org/10.1016/j.eswa.2024.121728)Cited by:[§1](https://arxiv.org/html/2606.06776#S1.p3.1),[§2](https://arxiv.org/html/2606.06776#S2.p2.1),[§2](https://arxiv.org/html/2606.06776#S2.p4.1)\.Similar Articles
ChurnNet: A Optimized Modern AI for Churn Prediction
This paper evaluates traditional machine learning techniques (Random Forests, XGBoost, SVM) against a deep learning model (Unified Multi-Task Time Series Model) for customer churn prediction in retail, finding that conventional methods can outperform in predictive performance and efficiency.
Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles
This paper presents a hybrid architecture combining FT-Transformer with gradient-boosted trees via calibration-aware stacking for customer churn prediction on structured tabular data, achieving improved F1 and AUC-ROC on a public bank churn dataset.
Binary Road Surface Classification Using Machine Learning on Production Vehicle Signals During Cruising
This paper presents machine learning frameworks for binary classification of road surface conditions (grip vs. slip) using production vehicle signals during cruising, addressing the limitations of traditional friction estimation methods that fail under low-slip conditions.
Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study
This study evaluates five machine learning classifiers for chronic kidney disease risk prediction, finding that near-perfect internal performance fails under distribution shift. It emphasizes the need for calibration stability and conformal coverage transfer before clinical deployment.
Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories
This paper proposes a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning to forecast 24-month CDR-SB change from ADNI clinical and biomarker histories, achieving reduced MSE and improved correlation over baselines.