Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift

arXiv cs.LG 05/25/26, 04:00 AM Papers
bike-sharing demand-prediction domain-adaptation optimal-transport temporal-shift robust-ot
Summary
This paper proposes Gen-ROTDA, a robust optimal transport-guided residual domain adaptation framework for predicting bike-sharing demand under temporal domain shift, achieving improved stability and accuracy compared to baselines, especially with noisy target data.
arXiv:2605.23115v1 Announce Type: new Abstract: Bike-sharing models trained on historical station-hour data may degrade when deployed in later years because travel patterns change over time. This paper studies March Citi Bike demand prediction from 2021 to 2026 as a temporal domain adaptation problem and proposes Gen-ROTDA, a robust optimal transport-guided residual domain adaptation framework. The method fits a target-domain station-time anchor with a small labeled target subset, transfers residual rather than raw demand, applies a deterministic label-preserving residual feature generator, and trims high-cost transport matches before training the final residual predictor. Experiments compare Gen-ROTDA with anchor-only, source-only, target-only, fine-tuning, MMD adaptation, Sinkhorn OTDA, ROTDA, and Gen-OTDA. Gen-ROTDA achieves the lowest MAE on the main 2025 to 2026 task and is the best OT-family method on average across multi-year tasks, although fine-tuning and MMD adaptation remain strong overall baselines. Under abnormal target-unlabeled records, Gen-ROTDA is much more stable than non-robust OT variants, suggesting that robust transport is useful for noisy temporal transfer in bike-sharing demand prediction.
Original Article
View Cached Full Text
Cached at: 05/25/26, 09:00 AM
# Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift
Source: [https://arxiv.org/html/2605.23115](https://arxiv.org/html/2605.23115)
###### Abstract

Bike\-sharing models trained on historical station\-hour data may degrade when deployed in later years because travel patterns change over time\. This paper studies March Citi Bike demand prediction from 2021 to 2026 as a temporal domain adaptation problem and proposes Gen\-ROTDA, a robust optimal transport\-guided residual domain adaptation framework\. The method fits a target\-domain station\-time anchor with a small labeled target subset, transfers residual rather than raw demand, applies a deterministic label\-preserving residual feature generator, and trims high\-cost transport matches before training the final residual predictor\. Experiments compare Gen\-ROTDA with anchor\-only, source\-only, target\-only, fine\-tuning, MMD adaptation, Sinkhorn OTDA, ROTDA, and Gen\-OTDA\. Gen\-ROTDA achieves the lowest MAE on the main 2025 to 2026 task and is the best OT\-family method on average across multi\-year tasks, although fine\-tuning and MMD adaptation remain strong overall baselines\. Under abnormal target\-unlabeled records, Gen\-ROTDA is much more stable than non\-robust OT variants, suggesting that robust transport is useful for noisy temporal transfer in bike\-sharing demand prediction\.

## IIntroduction

Accurate bike\-sharing demand prediction helps operators allocate bikes, rebalance stations, and maintain service quality\[[12](https://arxiv.org/html/2605.23115#bib.bib1),[16](https://arxiv.org/html/2605.23115#bib.bib2)\]\. In practice, a model trained on historical station\-hour observations is often deployed in a later year\. This creates temporal domain shift because station popularity, commuting behavior, rider composition, and operational patterns may change over time\. Such distribution changes are central to transfer learning and domain adaptation\[[1](https://arxiv.org/html/2605.23115#bib.bib3),[13](https://arxiv.org/html/2605.23115#bib.bib4)\], and they can make direct source\-year prediction unreliable\.

Cross\-year station\-hour prediction is especially challenging because demand is sparse, skewed, and noisy\. Many station\-hours have low demand, while a small number of stations and peak hours dominate prediction errors\. The target feature pool may also contain abnormal records caused by unusual operations, events, weather, or data\-quality issues\. A useful adaptation method should therefore align source and target feature distributions while remaining stable under abnormal target records\.

Optimal transport \(OT\) provides a natural way to align empirical distributions for domain adaptation\[[4](https://arxiv.org/html/2605.23115#bib.bib5),[5](https://arxiv.org/html/2605.23115#bib.bib6)\]\. Entropic OT, commonly computed by Sinkhorn iterations, improves numerical stability and scalability\[[6](https://arxiv.org/html/2605.23115#bib.bib7),[14](https://arxiv.org/html/2605.23115#bib.bib8)\]\. However, standard OT may still be affected by poorly matched or abnormal samples because high\-cost matches can influence the transport plan\. Robust OT addresses this issue by trimming or down\-weighting abnormal transport mass\[[10](https://arxiv.org/html/2605.23115#bib.bib9),[11](https://arxiv.org/html/2605.23115#bib.bib10)\]\.

This paper proposes Gen\-ROTDA, a robust OT\-guided residual domain adaptation method for bike\-sharing demand prediction\. The method uses an anchor\-residual design: a target\-domain station\-time anchor captures stable structure, and domain adaptation is applied only to the residual demand component\. Gen\-ROTDA then combines a deterministic label\-preserving residual feature generator with robust OT trimming\. The generator is not a GAN or a stochastic generative model; it is a residual feature transformation network that moves source residual features toward the target domain while preserving demand\-relevant information, following the broader idea of transferable representation learning through distribution alignment\[[7](https://arxiv.org/html/2605.23115#bib.bib11),[8](https://arxiv.org/html/2605.23115#bib.bib12),[9](https://arxiv.org/html/2605.23115#bib.bib13),[15](https://arxiv.org/html/2605.23115#bib.bib14)\]\.

The contributions are threefold\. First, we formulate cross\-year Citi Bike station\-hour prediction as a residual temporal domain adaptation problem\. Second, we propose Gen\-ROTDA, which combines label\-preserving residual feature transformation and robust OT alignment\. Third, we evaluate the method on March Citi Bike data from 2021 to 2026 using main\-task, multi\-year, robustness, ablation, and visualization experiments\.

## IIData and Prediction Task

### II\-ACiti Bike Station\-Hour Data

The experiments use Citi Bike trip records aggregated to station\-hour observations from the public Citi Bike system data\[[3](https://arxiv.org/html/2605.23115#bib.bib15)\]\. For stationssand hourtt, the target variable is

Ys,t=number of trips starting from stationsduring hourt\.Y\_\{s,t\}=\\text\{number of trips starting from station \}s\\text\{ during hour \}t\.\(1\)
Each observation is represented as\(Xs,t,Ys,t\)\(X\_\{s,t\},Y\_\{s,t\}\)\. The processed data cover March of each year from 2021 to 2026\. Using the same calendar month reduces seasonal confounding and focuses the experiments on cross\-year temporal shift\. The processed files contain station identifiers, station coordinates, calendar fields, lagged demand, rolling demand summaries, rider and bike\-type counts, and ratio variables\.

The main residual transfer experiments use a compact feature split\. The variables are classified into three groups\. The anchor features include spatial and calendar variables, namelystart\_lat,start\_lng,hour\_sin,hour\_cos,dow\_sin,dow\_cos, andis\_weekend\. The transfer features include demand\-history variables, namelylag\_1h,lag\_24h,rolling\_24h\_mean, androlling\_168h\_mean\. The target variable is demand, which represents the station\-hour bike demand to be predicted\.

The transfer features are transformed by log1p and standardized before domain adaptation\. The lag and rolling features are computed causally from past observations available before the prediction time\. The prediction setting is therefore short\-horizon station\-hour forecasting with historical demand information available, not a setting in which future target labels are used to construct features\.

### II\-BTemporal Domain Adaptation Setting

Let

𝒟s=\{\(Xis,Yis\)\}i=1ns\\mathcal\{D\}\_\{s\}=\\\{\(X\_\{i\}^\{s\},Y\_\{i\}^\{s\}\)\\\}\_\{i=1\}^\{n\_\{s\}\}\(2\)denote labeled source\-year samples, and let

𝒟t=\{\(Xjt,Yjt\)\}j=1nt\\mathcal\{D\}\_\{t\}=\\\{\(X\_\{j\}^\{t\},Y\_\{j\}^\{t\}\)\\\}\_\{j=1\}^\{n\_\{t\}\}\(3\)denote target\-year samples\. During adaptation, target features are available, and a small labeled target subset is used for anchor fitting and residual calibration\. The held\-out target labels are used only for final evaluation\. All compared methods use the same labeled target subset and the same target\-unlabeled feature pool\. The goal is to predict held\-out target demand under

Ps\(X,Y\)≠Pt\(X,Y\)\.P\_\{s\}\(X,Y\)\\neq P\_\{t\}\(X,Y\)\.\(4\)
The main task is 2025 to 2026 transfer\. Additional tasks include adjacent\-year transfers from 2021 to 2022 through 2024 to 2025 and two\-year transfers from 2021 to 2023 through 2024 to 2026\.

## IIIMethod

Figure[1](https://arxiv.org/html/2605.23115#S3.F1)illustrates the proposed Gen\-ROTDA framework for target\-year station\-hour demand prediction\. The method first decomposes demand into an anchor component and a residual component\. It then uses a label\-preserving generator to adapt source residual features toward the target domain, followed by robust optimal transport alignment to remove poorly matched samples\. Finally, a residual predictor is trained using the transported source residuals and labeled target residuals, and the final demand prediction is obtained by combining the anchor prediction with the predicted residual\.

![Refer to caption](https://arxiv.org/html/2605.23115v1/media_latex/fig1_framework.png)Figure 1:Overview of the proposed Gen\-ROTDA framework for robust residual domain adaptation in station\-hour demand prediction\.### III\-AAnchor\-Residual Decomposition

Directly transferring raw demand can be unstable because demand contains both stable station\-time effects and year\-specific residual dynamics\. We therefore use an anchor\-residual decomposition on the log\-demand scale\. Let

z=log⁡\(1\+Y\)\.z=\\log\(1\+Y\)\.\(5\)
A target\-domain anchor modelaϕ\(A\)a\_\{\\phi\}\(A\)is trained on labeled target samples using anchor featuresAA\. The residual is

r=z−aϕ\(A\)\.r=z\-a\_\{\\phi\}\(A\)\.\(6\)
The final prediction is

z^=aϕ\(A\)\+r^,Y^=exp⁡\(z^\)−1\.\\widehat\{z\}=a\_\{\\phi\}\(A\)\+\\widehat\{r\},\\qquad\\widehat\{Y\}=\\exp\(\\widehat\{z\}\)\-1\.\(7\)
Domain adaptation is applied only to residual prediction\. This design allows the small labeled target subset to calibrate stable station\-time structure, while the source year contributes transferable residual demand patterns\.

### III\-BLabel\-Preserving Residual Feature Generator

LetTTdenote the standardized transfer feature vector\. Gen\-OTDA and Gen\-ROTDA use a deterministic residual feature generator

Gθ\(T\)=T\+hθ\(T\),G\_\{\\theta\}\(T\)=T\+h\_\{\\theta\}\(T\),\(8\)wherehθh\_\{\\theta\}is a small multilayer perceptron\. The residual form discourages excessive feature movement and keeps the transformed source features close to their original representation\.

The generator is trained to make generated source features resemble target features while preserving residual\-label information, combining distribution matching ideas from MMD\-based adaptation with label\-aware transport intuition\[[4](https://arxiv.org/html/2605.23115#bib.bib5),[5](https://arxiv.org/html/2605.23115#bib.bib6),[8](https://arxiv.org/html/2605.23115#bib.bib12),[9](https://arxiv.org/html/2605.23115#bib.bib13)\]\. The objective is

ℒG=ℒalign\(Gθ\(Ts\),Tt\)\+λid‖Gθ\(Ts\)−Ts‖22\+λlpℒlabel\+λsupℒtarget\.\\mathcal\{L\}\_\{G\}=\\mathcal\{L\}\_\{\\mathrm\{align\}\}\(G\_\{\\theta\}\(T^\{s\}\),T^\{t\}\)\+\\lambda\_\{\\mathrm\{id\}\}\\\|G\_\{\\theta\}\(T^\{s\}\)\-T^\{s\}\\\|\_\{2\}^\{2\}\\\\ \+\\lambda\_\{\\mathrm\{lp\}\}\\mathcal\{L\}\_\{\\mathrm\{label\}\}\+\\lambda\_\{\\mathrm\{sup\}\}\\mathcal\{L\}\_\{\\mathrm\{target\}\}\.\(9\)
Hereℒalign\\mathcal\{L\}\_\{\\mathrm\{align\}\}is an MMD\-style distributional alignment loss\[[8](https://arxiv.org/html/2605.23115#bib.bib12),[9](https://arxiv.org/html/2605.23115#bib.bib13)\],ℒlabel\\mathcal\{L\}\_\{\\mathrm\{label\}\}trains a source residual prediction head on generated source features, andℒtarget\\mathcal\{L\}\_\{\\mathrm\{target\}\}uses labeled target residuals when available\. This objective makes the transformed source features more target\-like without ignoring the residual demand signal\.

### III\-COT and Robust OT Alignment

After source features are generated, OT\-based methods align source and target transfer\-feature distributions\[[4](https://arxiv.org/html/2605.23115#bib.bib5),[5](https://arxiv.org/html/2605.23115#bib.bib6)\]\. For generated source featuresT~is=Gθ\(Tis\)\\widetilde\{T\}\_\{i\}^\{s\}=G\_\{\\theta\}\(T\_\{i\}^\{s\}\)and target featuresTjtT\_\{j\}^\{t\}, the squared Euclidean cost is

Cij=‖T~is−Tjt‖2\.C\_\{ij\}=\\\|\\widetilde\{T\}\_\{i\}^\{s\}\-T\_\{j\}^\{t\}\\\|^\{2\}\.\(10\)
Sinkhorn OTDA computes an entropic transport coupling\[[6](https://arxiv.org/html/2605.23115#bib.bib7),[14](https://arxiv.org/html/2605.23115#bib.bib8)\]

π⋆=arg⁡minπ∈Π\(a,b\)⁡⟨π,C⟩\+ε∑i,jπij\(log⁡πij−1\)\.\\pi^\{\\star\}=\\arg\\min\_\{\\pi\\in\\Pi\(a,b\)\}\\langle\\pi,C\\rangle\+\\varepsilon\\sum\_\{i,j\}\\pi\_\{ij\}\(\\log\\pi\_\{ij\}\-1\)\.\(11\)
Gen\-ROTDA adds a robust trimming step motivated by outlier\-robust OT\[[10](https://arxiv.org/html/2605.23115#bib.bib9),[11](https://arxiv.org/html/2605.23115#bib.bib10)\]\. Coupling entries are sorted by transport cost, and high\-cost entries are removed until the retained coupling mass reaches the specifiedkeep\_mass\. The retained coupling is renormalized and used for barycentric transport:

T¯is=∑jπijtrimTjt∑jπijtrim\.\\overline\{T\}\_\{i\}^\{s\}=\\frac\{\\sum\_\{j\}\\pi\_\{ij\}^\{\\mathrm\{trim\}\}T\_\{j\}^\{t\}\}\{\\sum\_\{j\}\\pi\_\{ij\}^\{\\mathrm\{trim\}\}\}\.\(12\)
If trimming removes all retained mass from a source row, the implementation keeps the generated source feature for that row as a fallback\. This avoids an undefined barycentric projection\. The final residual predictor is trained on transported source residuals and labeled target residuals, with labeled target residuals up\-weighted bytarget\_weight\.

### III\-DCompared Methods

The comparison includes nine methods\. Anchor\-only uses only the target\-domain station\-time anchor model, while Source\-only trains a residual predictor on source data and directly applies it to the target year\. Target\-only trains the residual predictor using only labeled target samples, and Fine\-tuning pools source residuals with weighted labeled target residuals\. MMD adaptation performs source adaptation or reweighting based on MMD\. OTDA uses ordinary optimal transport alignment without a generator or robust trimming, while Sinkhorn OTDA uses entropic optimal transport alignment without robust trimming\. ROTDA applies robust optimal transport trimming without a generator\. Gen\-OTDA combines the residual feature generator with non\-robust optimal transport alignment, whereas Gen\-ROTDA combines the residual feature generator with robust optimal transport trimming\.

## IVExperimental Setup

The default experiments use March station\-hour observations\. The main experiment uses 2025 as the source year and 2026 as the target year\. The random seed is fixed at 2026\.

The experimental setup uses 1,000 source training samples and 1,000 unlabeled target samples\. The maximum number of labeled target samples is 500, and the maximum number of target test samples is 3,000\. Seven target labeled days are used\. The transfer feature set islag\_only\. The target weight is set to 8, the Sinkhorn regularization scale is 0\.1, and the robust retained mass is 0\.8\. The generator is trained for 200 epochs\.

The base prediction model is a random forest regressor with 300 trees andmin\_samples\_leaf=3\[[2](https://arxiv.org/html/2605.23115#bib.bib16)\]\.

Prediction performance is evaluated on held\-out target station\-hour observations\. The main metrics are mean absolute error \(MAE\), root mean squared error \(RMSE\), andR2R^\{2\}:

MAE=1n∑i=1n\|Y^i−Yi\|,\\mathrm\{MAE\}=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\|\\widehat\{Y\}\_\{i\}\-Y\_\{i\}\|,\(13\)RMSE=\(1n∑i=1n\(Y^i−Yi\)2\)1/2,\\mathrm\{RMSE\}=\\left\(\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\(\\widehat\{Y\}\_\{i\}\-Y\_\{i\}\)^\{2\}\\right\)^\{1/2\},\(14\)R2=1−∑i\(Yi−Y^i\)2∑i\(Yi−Y¯\)2\.R^\{2\}=1\-\\frac\{\\sum\_\{i\}\(Y\_\{i\}\-\\widehat\{Y\}\_\{i\}\)^\{2\}\}\{\\sum\_\{i\}\(Y\_\{i\}\-\\bar\{Y\}\)^\{2\}\}\.\(15\)
MAE is the primary metric because station\-hour demand is sparse and skewed\. RMSE is also reported because it is more sensitive to large errors\.

## VResults

### V\-AMain 2025 to 2026 Transfer Result

Table[I](https://arxiv.org/html/2605.23115#S5.T1)reports the main 2025 to 2026 transfer result\. Gen\-ROTDA obtains the lowest MAE, reducing MAE by 2\.36% relative to source\-only learning and by 0\.09% relative to Sinkhorn OTDA\. The improvement over ROTDA is small, so the main conclusion is not that the generator alone creates a large accuracy gain\. Rather, robust transport is the main source of MAE improvement, and the generator gives a modest additional benefit in this task\.

TABLE I:Main 2025 to 2026 transfer results\.The RMSE pattern is mixed\. Gen\-OTDA has the lowest RMSE and highestR2R^\{2\}, while Gen\-ROTDA has the lowest MAE\. This is consistent with the purpose of robust trimming: it is designed to improve stability and average absolute error under noisy matching, not necessarily to minimize large\-error\-sensitive RMSE\.

### V\-BMulti\-Year Transfer Comparison

Table[II](https://arxiv.org/html/2605.23115#S5.T2)averages results over four adjacent\-year tasks and four two\-year tasks\. Fine\-tuning and MMD adaptation have the best overall average MAE\. Gen\-ROTDA is therefore not the best method across all baselines in the multi\-year average\. However, it remains the best OT\-family method, slightly improving over Sinkhorn OTDA, ROTDA, and Gen\-OTDA\.

TABLE II:Average performance across multi\-year transfer tasks\.The identical aggregate values for fine\-tuning and MMD adaptation indicate that, under the current residual setup, the MMD adaptation step produces almost the same predictive solution as simple fine\-tuning\. We therefore interpret this baseline cautiously: it is a strong simple adaptation baseline, but the table does not provide separate evidence that MMD weighting adds benefit beyond fine\-tuning in this experiment\.

### V\-CRobustness under Abnormal Target Records

To evaluate robustness, abnormal records are injected into the target\-unlabeled feature pool at contamination ratios from 0% to 20%\. Table[III](https://arxiv.org/html/2605.23115#S5.T3)reports MAE for Sinkhorn OTDA, Gen\-OTDA, and Gen\-ROTDA\. Gen\-ROTDA is the most stable method under high contamination\. At 20% contamination, Sinkhorn OTDA increases from 0\.8634 to 0\.8810, and Gen\-OTDA increases from 0\.8678 to 0\.8876\. In contrast, Gen\-ROTDA changes only from 0\.8625 to 0\.8629\.

TABLE III:Robustness results under abnormal target\-unlabeled records\.![Refer to caption](https://arxiv.org/html/2605.23115v1/media_latex/fig2_robustness.png)Figure 2:Robustness degradation under abnormal target records\.
### V\-DAblation Study

Table[IV](https://arxiv.org/html/2605.23115#S5.T4)separates the effects of the generator and robust transport\. Adding robust trimming to OTDA improves MAE from 0\.8790 to 0\.8628\. Adding the generator without robust trimming gives Gen\-OTDA an MAE of 0\.8678\. Combining the generator with robust trimming gives the best MAE, 0\.8624, but the gain over ROTDA is small\.

TABLE IV:Ablation study on the 2025 to 2026 task\.The ablation confirms that robust OT is the dominant contributor to MAE improvement in the current residual setup\. The generator is still useful as a flexible feature transformation, but its empirical contribution is modest unless combined with robust trimming\.

## VIVisualization and Interpretation

Figure[3](https://arxiv.org/html/2605.23115#S6.F3)visualizes source, target, and generated source feature distributions by PCA for the 2025 to 2026 setting\. The PCA plot uses standardized log transfer features\. The centroid distance between source and target decreases from 0\.5125 to 0\.4140 after Gen\-ROTDA transformation, corresponding to a 19\.22% reduction\. The mean feature displacement is 0\.3965 and the median displacement is 0\.4115, suggesting a moderate transformation rather than a destructive remapping\.

![Refer to caption](https://arxiv.org/html/2605.23115v1/media_latex/fig3_pca.png)Figure 3:PCA domain alignment for 2025 to 2026\.Figure[4](https://arxiv.org/html/2605.23115#S6.F4)shows station\-hour demand shift between March 2025 and March 2026 for representative stations\. The summary contains 720 station\-hour cells from 30 stations\. The mean absolute demand difference is 0\.442 trips per station\-hour, and the largest absolute difference is 3\.742 trips per station\-hour\. Large negative shifts appear at important commute stations such as Grove St PATH during evening peak hours\.

![Refer to caption](https://arxiv.org/html/2605.23115v1/media_latex/fig4_heatmap.png)Figure 4:Demand heatmap for 2025 versus 2026\.These visualizations support the experimental setting\. The feature distributions are shifted but not completely unrelated, making residual domain adaptation plausible\. The demand heatmap also shows that some station\-hour patterns change substantially across years, which explains why direct source\-only transfer is insufficient and why target\-aware residual calibration is needed\.

## VIIDiscussion

The results show three main patterns\. First, the anchor\-residual formulation is important\. Anchor\-only prediction is not sufficient, but the anchor captures stable station\-time structure and makes residual transfer easier\. This also explains why fine\-tuning and MMD adaptation are strong baselines: once the stable structure is removed, the remaining residual problem is lower dimensional and can benefit directly from a small labeled target subset\.

Second, robust OT is the most reliable source of improvement, consistent with the known sensitivity of standard OT to outliers\[[11](https://arxiv.org/html/2605.23115#bib.bib10)\]\. In the main experiment and the ablation study, ROTDA and Gen\-ROTDA improve MAE relative to non\-robust OT variants\. Under abnormal target\-unlabeled records, robust trimming is especially important\. Gen\-ROTDA remains nearly unchanged at 20% contamination, whereas Sinkhorn OTDA and Gen\-OTDA degrade more noticeably\.

Third, the generator provides additional flexibility but should not be overstated\. The generator helps create target\-like source residual features and can improve the final robust OT method slightly, but the multi\-year results show that simple fine\-tuning and MMD adaptation remain very strong\. The current evidence therefore supports Gen\-ROTDA mainly as a robust OT\-family method rather than a universally best predictor across all baselines\.

The method has several practical advantages\. It is compatible with standard regression models, uses interpretable lagged demand features, and can be evaluated with conventional forecasting metrics\. The robust trimming mechanism is simple and directly targets abnormal transport matches\. The main limitations are the number of hyperparameters and the use of March data only\. Future work should evaluate more months, include weather and event covariates, and develop adaptive rules for choosing the retained transport mass\.

## VIIIConclusion

This paper proposes Gen\-ROTDA, a robust OT\-guided residual domain adaptation framework for cross\-year bike\-sharing demand prediction\. The method decomposes demand into a target\-domain station\-time anchor and a transferable residual component, applies a deterministic label\-preserving residual feature generator, and uses robust OT trimming before fitting the final residual predictor\. Experiments on Citi Bike station\-hour data from 2021 to 2026 show that Gen\-ROTDA achieves the best MAE on the main 2025 to 2026 task, is the best OT\-family method on average across multi\-year tasks, and is much more stable under abnormal target records than non\-robust OT variants\. The findings suggest that robust transport is useful for temporal domain adaptation in noisy urban mobility forecasting, while the benefit of generative residual feature transformation should be interpreted as complementary and task\-dependent\.

## References

- \[1\]S\. Ben\-David, J\. Blitzer, K\. Crammer, A\. Kulesza, F\. Pereira, and J\. W\. Vaughan\(2010\)A theory of learning from different domains\.Machine Learning79\(1–2\),pp\. 151–175\.External Links:[Document](https://dx.doi.org/10.1007/s10994-009-5152-4)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p1.1)\.
- \[2\]L\. Breiman\(2001\)Random forests\.Machine Learning45\(1\),pp\. 5–32\.External Links:[Document](https://dx.doi.org/10.1023/A%3A1010933404324)Cited by:[§IV](https://arxiv.org/html/2605.23115#S4.p3.1)\.
- \[3\]Citi Bike\(2026\)Citi bike system data\.Note:\[Online\]\. Available:[https://citibikenyc\.com/system\-data](https://citibikenyc.com/system-data)\. Accessed: May 19, 2026Cited by:[§II\-A](https://arxiv.org/html/2605.23115#S2.SS1.p1.2)\.
- \[4\]N\. Courty, R\. Flamary, A\. Habrard, and A\. Rakotomamonjy\(2017\)Joint distribution optimal transportation for domain adaptation\.InAdvances in Neural Information Processing Systems 30 \(NeurIPS\),pp\. 3730–3739\.Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p3.1),[§III\-B](https://arxiv.org/html/2605.23115#S3.SS2.p2.1),[§III\-C](https://arxiv.org/html/2605.23115#S3.SS3.p1.2)\.
- \[5\]N\. Courty, R\. Flamary, D\. Tuia, and A\. Rakotomamonjy\(2017\)Optimal transport for domain adaptation\.IEEE Transactions on Pattern Analysis and Machine Intelligence39\(9\),pp\. 1853–1865\.External Links:[Document](https://dx.doi.org/10.1109/TPAMI.2016.2615921)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p3.1),[§III\-B](https://arxiv.org/html/2605.23115#S3.SS2.p2.1),[§III\-C](https://arxiv.org/html/2605.23115#S3.SS3.p1.2)\.
- \[6\]M\. Cuturi\(2013\)Sinkhorn distances: lightspeed computation of optimal transport\.InAdvances in Neural Information Processing Systems 26 \(NeurIPS\),pp\. 2292–2300\.Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p3.1),[§III\-C](https://arxiv.org/html/2605.23115#S3.SS3.p2.1)\.
- \[7\]Y\. Ganin, E\. Ustinova, H\. Ajakan, P\. Germain, H\. Larochelle, F\. Laviolette, M\. Marchand, and V\. Lempitsky\(2016\)Domain\-adversarial training of neural networks\.Journal of Machine Learning Research17\(59\),pp\. 1–35\.Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p4.1)\.
- \[8\]A\. Gretton, K\. M\. Borgwardt, M\. J\. Rasch, B\. Schölkopf, and A\. Smola\(2012\)A kernel two\-sample test\.Journal of Machine Learning Research13\(25\),pp\. 723–773\.Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p4.1),[§III\-B](https://arxiv.org/html/2605.23115#S3.SS2.p2.1),[§III\-B](https://arxiv.org/html/2605.23115#S3.SS2.p3.3)\.
- \[9\]M\. Long, Y\. Cao, J\. Wang, and M\. I\. Jordan\(2015\)Learning transferable features with deep adaptation networks\.InProceedings of the 32nd International Conference on Machine Learning \(ICML\),PMLR, Vol\.37,pp\. 97–105\.Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p4.1),[§III\-B](https://arxiv.org/html/2605.23115#S3.SS2.p2.1),[§III\-B](https://arxiv.org/html/2605.23115#S3.SS2.p3.3)\.
- \[10\]Y\. Ma, H\. Liu, D\. La Vecchia, and M\. Lerasle\(2025\)Inference via robust optimal transportation: theory and methods\.International Statistical Review\.Note:Early ViewExternal Links:[Document](https://dx.doi.org/10.1111/insr.70000)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p3.1),[§III\-C](https://arxiv.org/html/2605.23115#S3.SS3.p3.1)\.
- \[11\]D\. Mukherjee, A\. Guha, J\. M\. Solomon, Y\. Sun, and M\. Yurochkin\(2021\)Outlier\-robust optimal transport\.InProceedings of the 38th International Conference on Machine Learning \(ICML\),PMLR, Vol\.139,pp\. 7850–7860\.Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p3.1),[§III\-C](https://arxiv.org/html/2605.23115#S3.SS3.p3.1),[§VII](https://arxiv.org/html/2605.23115#S7.p2.1)\.
- \[12\]E\. O’Mahony and D\. B\. Shmoys\(2015\)Data analysis and optimization for \(citi\)bike sharing\.InProceedings of the Twenty\-Ninth AAAI Conference on Artificial Intelligence \(AAAI\),pp\. 687–694\.External Links:[Document](https://dx.doi.org/10.1609/aaai.v29i1.9245)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p1.1)\.
- \[13\]S\. J\. Pan and Q\. Yang\(2010\)A survey on transfer learning\.IEEE Transactions on Knowledge and Data Engineering22\(10\),pp\. 1345–1359\.External Links:[Document](https://dx.doi.org/10.1109/TKDE.2009.191)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p1.1)\.
- \[14\]G\. Peyré and M\. Cuturi\(2019\)Computational optimal transport\.Foundations and Trends in Machine Learning11\(5–6\),pp\. 355–607\.External Links:[Document](https://dx.doi.org/10.1561/2200000073)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p3.1),[§III\-C](https://arxiv.org/html/2605.23115#S3.SS3.p2.1)\.
- \[15\]B\. Sun and K\. Saenko\(2016\)Deep CORAL: correlation alignment for deep domain adaptation\.InComputer Vision – ECCV 2016 Workshops,Lecture Notes in Computer Science, Vol\.9915,pp\. 443–450\.External Links:[Document](https://dx.doi.org/10.1007/978-3-319-49409-8%5F35)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p4.1)\.
- \[16\]P\. Vogel, T\. Greiser, and D\. C\. Mattfeld\(2011\)Understanding bike\-sharing systems using data mining: exploring activity patterns\.Procedia – Social and Behavioral Sciences20,pp\. 514–523\.External Links:[Document](https://dx.doi.org/10.1016/j.sbspro.2011.08.058)Cited by:[§I](https://arxiv.org/html/2605.23115#S1.p1.1)\.
Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift

Similar Articles

STAGformer: A Spatio-temporal Agent Graph Transformer for Micro Mobility Demand Forecasting

Domain-Adaptive Climate Downscaling Under Temporal Distribution Shift

Locality-aware Private Class Identification for Domain Adaptation with Extreme Label Shift

Domain randomization and generative models for robotic grasping

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Submit Feedback

Similar Articles

STAGformer: A Spatio-temporal Agent Graph Transformer for Micro Mobility Demand Forecasting
Domain-Adaptive Climate Downscaling Under Temporal Distribution Shift
Locality-aware Private Class Identification for Domain Adaptation with Extreme Label Shift
Domain randomization and generative models for robotic grasping
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework