Towards Inclusive Mobility Modeling: Characterizing and Evaluating Elderly Trajectory Patterns in Urban Systems
Summary
This paper examines how the underrepresentation of elderly riders in mobility datasets introduces systematic bias into mobility modeling, using Citi Bike data from Jersey City. It shows that models trained on majority-dominated populations misrepresent elderly mobility behavior, and that higher-capability models do not necessarily improve subgroup fidelity under limited demographic data.
View Cached Full Text
Cached at: 07/01/26, 05:37 AM
# Characterizing and Evaluating Elderly Trajectory Patterns in Urban SystemsTo appear in 7th International Conference on Social Computing (ICSC)
Source: [https://arxiv.org/html/2606.31207](https://arxiv.org/html/2606.31207)
11institutetext:School of Computing and Artificial Intelligence, Shanghai University of Finance and Economics, Shanghai, China22institutetext:MOE Key Laboratory of Interdisciplinary Research of Computation and Economics, Shanghai University of Finance and Economics, Shanghai, China
33institutetext:Center for Data Science, New York University, New York, USA33email:zhoumengying@sufe\.edu\.cn## Towards Inclusive Mobility Modeling: Characterizing and Evaluating Elderly Trajectory Patterns in Urban Systems††thanks:To appear in 7th International Conference on Social Computing \(ICSC\)
###### Abstract
The rapid advance of smart cities increasingly depends on trajectory data mining, yet underrepresented demographic groups, particularly the elderly, are often sparsely represented in public mobility datasets\. This underrepresentation can introduce systematic bias into mobility modeling and downstream urban planning\. Using the 2016–2020 Jersey City subset of the Citi Bike System Data, this study quantitatively examines how the absence of underrepresented subgroups’ mobility signatures affects mobility modeling, using synthetic trajectory generation as a case study\. The analysis reveals that elderly riders exhibit a structurally distinct mobility signature, including localized activity spaces \(958 m vs\. 1,189 m for young riders\), lower mobility entropy \(1\.82 vs\. 4\.15\), and asymmetric off\-peak temporal patterns\. To demonstrate that relying on majority\-dominated training data yields biased synthetic outcomes, we further evaluate both a first\-order Markov chain and a Qwen3\-4B model fine\-tuned with QLoRA across three demographic training settings: the full population, young riders only, and elderly riders only\. Results show that models trained on majority\-dominated populations systematically misrepresent elderly mobility behavior, particularly for spatial mobility metrics\. The Markov model trained on the full population overestimates elderly step length by 4\.5% and dwell time by 8\.9%, whereas the elderly\-specific model achieves substantially lower errors across most metrics\. Comparisons between the Markov and LLM\-based frameworks further show that higher\-capability models do not necessarily improve subgroup\-level fidelity under limited demographic data\. These findings underscore the importance of demographic representation in mobility modeling and its downstream applications for underrepresented populations\.
††footnotetext:⋆Equal contribution\.††footnotetext:†Corresponding author\.## 1Introduction
The integration of smart technologies into urban infrastructure has reshaped transportation planning and public health, with trajectory data mining at the core of this transformation\[[36](https://arxiv.org/html/2606.31207#bib.bib1)\]\. As global populations age, ensuring that smart\-city systems serve all demographic groups equitably has become a pressing concern in social computing\[[20](https://arxiv.org/html/2606.31207#bib.bib22)\]\. Yet the datasets underpinning these systems rarely carry granular demographic annotations, rendering the mobility patterns of older adults largely invisible to standard analytical pipelines\[[10](https://arxiv.org/html/2606.31207#bib.bib2),[8](https://arxiv.org/html/2606.31207#bib.bib8),[1](https://arxiv.org/html/2606.31207#bib.bib23)\]\. Recent large\-scale analyses confirm systematic demographic disparities in human mobility data\[[33](https://arxiv.org/html/2606.31207#bib.bib32),[22](https://arxiv.org/html/2606.31207#bib.bib33)\], and a growing body of work calls for algorithmic fairness in urban computing\[[34](https://arxiv.org/html/2606.31207#bib.bib31),[17](https://arxiv.org/html/2606.31207#bib.bib42)\]\.
In age\-friendly urban planning\[[6](https://arxiv.org/html/2606.31207#bib.bib37)\], capturing the distinct travel patterns of older adults is critical: their mobility is shaped by physical accessibility, safety concerns, and different daily activity schedules\[[14](https://arxiv.org/html/2606.31207#bib.bib35),[35](https://arxiv.org/html/2606.31207#bib.bib36)\]\. Mobility models trained on broad, unannotated datasets that are disproportionately representative of younger, working\-age populations inherently encode demographic biases\[[27](https://arxiv.org/html/2606.31207#bib.bib34)\]\.
When synthetic mobility data derived from majority populations informs downstream decisions such as facility allocation or public transit routing, the mobility patterns of elderly riders may become systematically underrepresented, potentially undermining initiatives such as the 15\-minute city vision of localized and accessible neighborhoods\[[18](https://arxiv.org/html/2606.31207#bib.bib41)\]\.
A critical barrier to studying this problem is the scarcity of mobility datasets with reliable demographic labels\. The Citi Bike bike\-sharing system historically collected riders’ birth year and gender at account registration, providing one of the few large\-scale mobility sources with age metadata\[[19](https://arxiv.org/html/2606.31207#bib.bib25)\]\. However, citing privacy concerns, Citi Bike removed demographic attributes from public trip data beginning in 2021\.
To examine how demographic underrepresentation propagates bias into downstream mobility modeling, this work uses synthetic trajectory generation as a controlled testbed\. We construct an experimental framework comparing a first\-order Markov chain and a Large Language Model\(LLM\) under identical demographic training conditions\. We focus on the 2016\-2020 JC Citi Bike dataset and construct a controlled experimental design that isolates the effect of demographic composition\. The main contributions are as follows:
1. 1\.Age\-Specific Mobility Characterization:Through comparative analysis of elderly \(age≥65\\geq 65\) and young \(age 18–35\) rider groups in the 2016–2020 Jersey City Citi Bike, we show that the elderly exhibit a structurally distinct mobility signature: spatially confined \(radius of gyration of 958 m versus 1,189 m\), behaviorally routine \(entropy 1\.82 versus 4\.15\), and temporally offset from commuting peaks\.
2. 2\.Controlled Multi\-Model Experiment:We evaluate both a first\-order Markov chain and a Qwen3\-4B model fine\-tuned with QLoRA under three demographic training settings \(full population, young\-only, and elderly\-only\)\. This controlled design isolates the effect of demographic composition on downstream mobility fidelity while enabling direct comparison between structured probabilistic and higher\-capacity sequence modeling approaches\.
3. 3\.Effects of Demographic Composition on Downstream Mobility Fidelity:Our results quantify how demographic imbalance in training data propagates bias into downstream mobility modeling\. Models trained on majority\-dominated populations systematically overestimate elderly mobility characteristics, while elderly\-specific training achieves substantially closer agreement with empirical elderly mobility patterns\.
4. 4\.Reproducible Evaluation Framework:We propose a reproducible evaluation pipeline spanning data preprocessing, mobility metric construction, synthetic trajectory generation, and subgroup\-specific evaluation\. The framework provides a benchmark for evaluating demographic\-specific fidelity in downstream mobility modeling\.
## 2Related Work
### 2\.1Human Mobility and Trajectory Generation
Understanding human mobility is foundational to urban computing\[[36](https://arxiv.org/html/2606.31207#bib.bib1)\]\. Early seminal work by González et al\.\[[12](https://arxiv.org/html/2606.31207#bib.bib6)\]and Song et al\.\[[26](https://arxiv.org/html/2606.31207#bib.bib7)\]utilized mobile phone data to demonstrate that human movement is highly regular and predictable, exhibiting bounded spatial variance\. Alessandretti et al\.\[[2](https://arxiv.org/html/2606.31207#bib.bib44)\]further revealed that human mobility operates across characteristic spatial and temporal scales, with individual trajectories exhibiting stable, idiosyncratic patterns\.
Building upon these foundations, trajectory generation has evolved from statistical modeling to sophisticated deep learning techniques\. Traditional approaches, such as grid\-based Markov models, simulate movement via transition matrices, while data\-driven routine models synthesize trajectories by learning historical motifs\[[21](https://arxiv.org/html/2606.31207#bib.bib3)\]\. Recently, deep generative models, including Generative Adversarial Networks \(GANs\)\[[15](https://arxiv.org/html/2606.31207#bib.bib15),[24](https://arxiv.org/html/2606.31207#bib.bib17)\], sequence\-to\-sequence architectures, and spatial\-temporal graph neural networks\[[31](https://arxiv.org/html/2606.31207#bib.bib14)\], have been employed to capture complex spatial\-temporal dependencies\. The emergence of diffusion probabilistic models has further advanced the field: DiffTraj\[[38](https://arxiv.org/html/2606.31207#bib.bib27)\]and related methods\[[11](https://arxiv.org/html/2606.31207#bib.bib29)\]leverage iterative denoising to generate high\-fidelity GPS trajectories with improved sample quality and diversity\. Most recently, LLM\-based approaches such as MobilityGPT\[[29](https://arxiv.org/html/2606.31207#bib.bib30)\]have been proposed, treating trajectories as token sequences for autoregressive generation\. Several comprehensive surveys have catalogued this rapidly expanding landscape\[[4](https://arxiv.org/html/2606.31207#bib.bib12),[37](https://arxiv.org/html/2606.31207#bib.bib26)\]\. However, the majority of these models are optimized for general population accuracy, often overlooking the nuanced mobility patterns of underrepresented demographic subgroups\[[27](https://arxiv.org/html/2606.31207#bib.bib34)\]\.
### 2\.2Trajectory Evaluation and Privacy
Evaluation of synthetic trajectory quality presents a multifaceted challenge\. Classical approaches rely on statistical distribution matching \(e\.g\., comparing step length, speed, radius of gyration\), while sequence alignment metrics such as Dynamic Time Warping\[[25](https://arxiv.org/html/2606.31207#bib.bib9)\]and Fréchet distance\[[3](https://arxiv.org/html/2606.31207#bib.bib10)\]assess structural fidelity\. Recent work by Lucas et al\.\[[16](https://arxiv.org/html/2606.31207#bib.bib38)\]proposed a comprehensive quality framework spanning statistical, spatial, and temporal dimensions, while Wang et al\.\[[30](https://arxiv.org/html/2606.31207#bib.bib39)\]systematically reviewed trajectory similarity metrics in the era of deep learning\.
Alongside quality concerns, privacy\-preserving trajectory generation has received growing attention\. Cunningham et al\.\[[7](https://arxiv.org/html/2606.31207#bib.bib16)\]established foundational methods for synthetic location data with formal privacy guarantees\. Chen et al\.\[[5](https://arxiv.org/html/2606.31207#bib.bib40)\]advanced differentially private trajectory generation that balances utility and privacy, while Van de Ven et al\.\[[28](https://arxiv.org/html/2606.31207#bib.bib43)\]critically examined the limitations of generative models for trajectory privacy\. Due to privacy concerns, demographic attributes such as birth year and gender were no longer publicly available in Citi Bike trip data beginning in 2018, leaving fewer demographic signals available for evaluating algorithmic fairness on this platform\. Our work quantifies how the absence of such signals can reduce the fidelity of mobility models and their downstream outputs for older adults\.
### 2\.3Social Computing for Age\-Friendly Cities
The intersection of social computing and urban gerontology highlights the necessity of age\-friendly urban environments\[[32](https://arxiv.org/html/2606.31207#bib.bib4)\]\. Research indicates that older adults often face mobility constraints related to physical capabilities and access to public transportation, contributing to a strong reliance on localized, neighborhood\-centric travel\[[10](https://arxiv.org/html/2606.31207#bib.bib2),[13](https://arxiv.org/html/2606.31207#bib.bib18)\]\. Recent large\-scale studies leveraging mobile phone data\[[14](https://arxiv.org/html/2606.31207#bib.bib35)\]and GPS tracking\[[35](https://arxiv.org/html/2606.31207#bib.bib36)\]have provided quantitative evidence that older adults exhibit significantly smaller activity spaces, lower mobility entropy, and distinct temporal patterns compared to younger demographics\.
Despite the growing emphasis on inclusive cities\[[6](https://arxiv.org/html/2606.31207#bib.bib37)\]and the 15\-minute city paradigm\[[18](https://arxiv.org/html/2606.31207#bib.bib41)\], privacy concern\[[8](https://arxiv.org/html/2606.31207#bib.bib8)\]and the limited availability of age annotations in public mobility datasets have constrained data\-driven research on older adults’ mobility\. Recent analyses of demographic disparities in mobility data\[[33](https://arxiv.org/html/2606.31207#bib.bib32),[22](https://arxiv.org/html/2606.31207#bib.bib33)\], together with emerging work on algorithmic fairness in urban computing\[[34](https://arxiv.org/html/2606.31207#bib.bib31)\], underscore the importance of addressing representation gaps\. Our work bridges this gap by leveraging the 2016–2020 Jersey City subset of the Citi Bike System Data, which contains publicly available age\-annotated trip records, to systematically evaluate how training\-data composition affects the fidelity of mobility models for underrepresented populations\.
## 3Characterizing Elderly Mobility Pattern
Before evaluating how demographic composition biases downstream mobility modeling, we first establish whether elderly and young riders exhibit meaningfully different mobility patterns in the first place\. This section processes the Citi Bike trip records into mobility trajectories, defines the spatial and temporal metrics used throughout the paper, and applies them to compare the two groups\. The resulting characterization confirms that elderly riders follow a structurally distinct mobility signature, setting the stage for the controlled experiments in Section 4\.
### 3\.1Data Source and Preprocessing
Citi Bike is a bike\-sharing system operating in Jersey City and New York City\. It provides trip\-level records containing origin and destination stations, station coordinates, trip duration, user type, and, for records prior to 2021, rider birth year and gender\. Using rider birth year, we classify 1,638,153 Jersey City trips recorded between 2016 and 2020 into age groups over a 60\-month period\.
After removing trips with Haversine distance greater than 10 km or computed speed exceeding 10 m/s, 93\.3% of records were retained, yielding a full\-population dataset containing 1,528,325 trips from 6,102 bikes\. Using rider birth year, we define elderly \(65\+\) and young \(18–35\) subsets from the filtered data\. The elderly subset contains 14,394 trips from 2,007 unique bikes, representing 0\.9% of all filtered trips, whereas the young subset contains 783,558 trips from 5,832 bikes, representing 51\.3%\. Trips associated with the same bike are ordered chronologically to construct mobility trajectories and compute dwell times\.
### 3\.2Descriptive Statistics
Table[1](https://arxiv.org/html/2606.31207#S3.T1)summarizes the post\-filtering descriptive statistics of the JC Citi Bike 2016–2020 dataset stratified by age group\.
Table 1:Descriptive Statistics of the JC Citi Bike 2016–2020 dataset by age group \(post\-filtering\)\.The elderly group is substantially smaller than the young group, accounting for 0\.9% of all filtered trips compared to 51\.3% for young riders\. Elderly riders also exhibit shorter trajectories on average \(7\.2 vs\. 134\.4 trips per trajectory\) and interact with fewer unique stations \(50 vs\. 55\), indicating lower overall mobility intensity and spatial coverage\.
Gender and subscription patterns are broadly similar across the two groups\. Elderly riders are composed of 70% male and 28% female users \(2% unspecified\), compared to 75% male and 25% female among young riders\. Both groups consist almost entirely of annual Subscribers \(99%\)\.
### 3\.3Evaluation Metrics
We evaluate synthetic mobility fidelity using spatial and temporal metrics that characterize mobility range, visitation regularity, and travel dynamics\. Let a trajectory consist ofnntrips for a given bike, where each tripiicontains origin and destination coordinates together with departure and arrival timestamps\.
#### 3\.3\.1Spatial Metrics
- •Step Length:The Haversine distance between trip origin and destination stations, averaged across all trips within a group\.
- •Radius of Gyration \(RoG\):A measure of spatial dispersion computed relative to the trajectory center of mass\(lat¯,lon¯\)\(\\overline\{\\text\{lat\}\},\\overline\{\\text\{lon\}\}\): rg=1n∑i=1nd\(\(latie,lonie\),\(lat¯,lon¯\)\)2r\_\{g\}=\\sqrt\{\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}d\\big\(\(\\text\{lat\}\_\{i\}^\{e\},\\text\{lon\}\_\{i\}^\{e\}\),\\ \(\\overline\{\\text\{lat\}\},\\overline\{\\text\{lon\}\}\)\\big\)^\{2\}\}\(1\)
- •Mobility Entropy:The Shannon entropy of station visitation frequencies: S=−∑jpjlog2\(pj\)S=\-\\sum\_\{j\}p\_\{j\}\\log\_\{2\}\(p\_\{j\}\)\(2\)wherepjp\_\{j\}denotes the fraction of trips ending at stationjj\. Lower entropy indicates more regular and predictable mobility patterns\.
#### 3\.3\.2Temporal Metrics
- •Speed:The ratio of trip distance to trip duration\.
- •Dwell Time:The idle intervalti\+1s−tiet\_\{i\+1\}^\{s\}\-t\_\{i\}^\{e\}between two consecutive trips on the same bike\. Dwell times are restricted to\[0,86400\)\[0,86400\)seconds to exclude overnight or long\-term inactivity\.
- •Intra\-day Temporal Distribution:The normalized hourly distribution of trip start times aggregated over 24\-hour bins\.
### 3\.4Demographic Mobility Characterization
Figure[1](https://arxiv.org/html/2606.31207#S3.F1)summarizes demographic differences in mobility structure and temporal activity patterns between elderly and young riders\.
Figure 1:Demographic mobility signatures of elderly \(65\+\) and young \(18–35\) riders derived from JC CitiBike 2016–2020 data\.Spatial Mobility Differences\.Figure[1](https://arxiv.org/html/2606.31207#S3.F1)\(a\) compares mobility metrics between elderly and young riders\. Step lengths and travel speeds remain broadly similar across the two groups \(1,004 m vs\. 1,102 m; 2\.41 m/s vs\. 2\.61 m/s\)\. In contrast, elderly riders exhibit lower spatial dispersion and mobility diversity, with a smaller radius of gyration \(958 m vs\. 1,189 m\) and substantially lower mobility entropy \(1\.82 vs\. 4\.15\)\.
The aggregate all\-rider entropy \(4\.36\) exceeds that of the young group alone, indicating that elderly riders contribute additional station visitation diversity beyond the dominant young\-rider mobility pattern\. This difference is further reflected in the station\-set Jaccard similarity between the two groups, which is 0\.521\.
Temporal Activity Patterns\.Figure[1](https://arxiv.org/html/2606.31207#S3.F1)\(b\) compares intra\-day mobility activity between the two groups\. Young riders exhibit pronounced morning and evening activity peaks around 08:00 and 17:00–18:00, whereas elderly riders maintain more evenly distributed daytime activity concentrated between 09:00 and 17:00\. Elderly riders also exhibit shorter dwell intervals between trips \(18,329 s vs\. 27,928 s\), indicating distinct temporal usage patterns between the two groups\.
## 4Demographic Bias in Mobility Modeling
Section 3 established that elderly and young riders follow structurally distinct mobility patterns\. These differences carry direct implications for downstream urban computing tasks, such as facility allocation and transit routing, that depend on faithful mobility representations\. A foundational component shared by these tasks is trajectory generation: synthetic mobility data are used to augment sparse observations, power urban simulations, and enable privacy\-preserving data sharing\. As LLMs emerge as the emerging paradigm for trajectory\-based mobility analysis, understanding whether demographic bias propagates through generative pipelines becomes increasingly urgent\. This section examines the question through a controlled experiment in which a first\-order Markov chain and a Qwen3\-4B model \(QLoRA fine\-tuned\) are each trained under elderly\-only, young\-only, and full\-population settings, and evaluated against real elderly trajectories using the metrics defined in Section 3\.3\.
### 4\.1Experiment Design
To evaluate how training\-data demographics affect synthetic elderly mobility fidelity, we construct a controlled generative experiment using three training populations: elderly\-only \(65\+\), young\-only \(18–35\), and the full rider population\. For both the Markov and LLM paradigms, separate models are trained on each population:
- •ℳelderly\\mathcal\{M\}\_\{\\text\{elderly\}\}/ℒelderly\\mathcal\{L\}\_\{\\text\{elderly\}\}:Trained exclusively on elderly trips \(65\+\)\.
- •ℳyoung\\mathcal\{M\}\_\{\\text\{young\}\}/ℒyoung\\mathcal\{L\}\_\{\\text\{young\}\}:Trained exclusively on young trips \(18–35\)\.
- •ℳall\\mathcal\{M\}\_\{\\text\{all\}\}/ℒall\\mathcal\{L\}\_\{\\text\{all\}\}:Trained on the full filtered dataset \(all age groups\)\.
Each model generates 2,007 synthetic trajectories to match the empirical elderly trajectory count\. Synthetic trajectory lengths are sampled from the real elderly trajectory\-length distribution \(mean 7\.2 trips per trajectory\)\. All generated datasets are evaluated against the real elderly trajectories using the identical mobility evaluation pipeline\.
Comparisons across training populations assess how demographic composition affects the fidelity of synthetic elderly mobility trajectories, while comparisons between the Markov and LLM paradigms examine the role of generative framework choice under severe minority\-data sparsity\.
### 4\.2Markov Chain Generative Model
We implement a first\-order Markov chain model to generate synthetic mobility trajectories from empirical trip sequences\. The model estimates station\-to\-station transition probabilitiesP\(sj∣si\)P\(s\_\{j\}\\mid s\_\{i\}\), start\-station probabilitiesPstart\(si\)P\_\{\\text\{start\}\}\(s\_\{i\}\), and empirical distributions for dwell time, departure hour, and trajectory length\. Trip durations for each station pair\(i,j\)\(i,j\)are modeled using Gaussian distributions parameterized as𝒩\(μij,σij\)\\mathcal\{N\}\(\\mu\_\{ij\},\\sigma\_\{ij\}\)\.
Synthetic trajectories are generated by first sampling a start station and trajectory length, followed by iterative sampling of destination stations conditioned on the current station\. Each transition is assigned a duration sampled from the corresponding transition\-specific Gaussian distribution, truncated to a minimum of 60 seconds\. Dwell intervals between consecutive trips are sampled from the empirical dwell\-time distribution and capped at 24 hours, while departure hours are sampled from the empirical hour\-of\-day distribution with randomized minutes and seconds\. For stations without outgoing transitions in the training data, destination sampling falls back to the start\-station distributionPstart\(si\)P\_\{\\text\{start\}\}\(s\_\{i\}\)\. Synthetic datasets are generated by repeating this procedure until the target trajectory count is reached\.
### 4\.3LLM\-based Generative Model
We also implement a Qwen3\-4B\[[23](https://arxiv.org/html/2606.31207#bib.bib45)\]LLM fine\-tuned using 4\-bit NormalFloat QLoRA\[[9](https://arxiv.org/html/2606.31207#bib.bib46)\]to evaluate a higher\-capacity generative framework under sparse demographic data\. LoRA adapters are applied to all linear layers with rankr=16r=16,α=32\\alpha=32, and dropout 0\.05, yielding approximately 100M trainable parameters\.
#### 4\.3\.1Trajectory Serialization and Fine\-Tuning\.
Each trajectory is represented as a whitespace\-separated sequence of station IDs, such as3195 3205 3186 …\. Trajectories are tokenized using the Qwen3 tokenizer and truncated to 768 tokens\. Separate LoRA adapters are fine\-tuned for 3 epochs using the elderly\-only, young\-only, and full\-population training corpora described in Section[4\.1](https://arxiv.org/html/2606.31207#S4.SS1), with batch size 4 and learning rate2×10−42\\times 10^\{\-4\}\. Training corpus sizes are summarized in Table[2](https://arxiv.org/html/2606.31207#S4.T2)\.
Table 2:LLM training corpora \(Jersey City 2016–2020, after filtering\)\.
#### 4\.3\.2Generation Procedure\.
The fine\-tuned model generates trajectories via batched autoregressive decoding \(batch size 32\) with temperature 0\.7, top\-pp0\.9, and top\-kk50, producing up to 128 new tokens per trajectory\. Decoding is seeded with a start\-of\-sequence token\. The generated token stream is parsed back into station ID integers by concatenating consecutive digit tokens; non\-numeric tokens and station IDs not present in the training vocabulary are discarded\. Each model generates 2,007 synthetic trajectories to match the real elderly trajectory count\.
#### 4\.3\.3Temporal Synthesis\.
Since the LLM operates on station sequences only and does not model continuous temporal quantities, we synthesize temporal attributes post hoc using fixed heuristics\. Trip duration is set to trip distance divided by 4\.17 m/s \(the real elderly mean speed\), and inter\-trip dwell time is sampled from an exponential distribution with mean 600 s\. These heuristics are identical across all three LLM\-trained models\.
### 4\.4Markov\-based Generation Results
We evaluate how training\-data composition affects the fidelity of synthetic elderly mobility trajectories using three Markov models trained on different demographic populations:ℳfull\\mathcal\{M\}\_\{\\text\{full\}\},ℳyoung\\mathcal\{M\}\_\{\\text\{young\}\}, andℳelderly\\mathcal\{M\}\_\{\\text\{elderly\}\}\. Each model generates 2,007 synthetic trajectories matching the empirical elderly trajectory count and length distribution\. Figure[2](https://arxiv.org/html/2606.31207#S4.F2)and Table[3](https://arxiv.org/html/2606.31207#S4.T3)summarize the resulting mobility metrics relative to the real elderly baseline\.
Figure 2:Synthetic trajectory evaluation under different training populations\. \(a\) Absolute comparison of mobility metrics across Real Elderly and synthetic trajectories generated byℳfull\\mathcal\{M\}\_\{\\text\{full\}\},ℳyoung\\mathcal\{M\}\_\{\\text\{young\}\}, andℳelderly\\mathcal\{M\}\_\{\\text\{elderly\}\}\(log scale\)\. \(b\) Relative error \(%\) of each synthetic method with respect to Real Elderly across the five mobility metrics\.Table 3:Comparison of real elderly metrics with synthetic trajectories generated by Markov models trained on different demographic compositions\.ℳyoung\\mathcal\{M\}\_\{\\text\{young\}\}exhibits substantial deviations from the real elderly baseline, particularly for Dwell Time \(\+44\.3%\), Entropy \(\+130\.8%\), and RoG \(\+15\.6%\), indicating that mobility patterns learned exclusively from young riders do not transfer well to elderly mobility behavior\.
Including all demographic groups in the training data improves temporal fidelity\. Compared withℳyoung\\mathcal\{M\}\_\{\\text\{young\}\},ℳfull\\mathcal\{M\}\_\{\\text\{full\}\}reduces the Dwell Time error from 44\.3% to 8\.9%\. However, spatial metrics remain poorly reproduced, with Entropy and RoG errors remaining comparable to the young\-only model\. This suggests that aggregate transition patterns remain dominated by the majority population\.
Among the three variants,ℳelderly\\mathcal\{M\}\_\{\\text\{elderly\}\}achieves the closest agreement with the real elderly baseline for Step Length, Dwell Time, and Entropy, with errors below 5%\. However, it underestimates RoG by 22\.9%, indicating reduced spatial coverage in the generated trajectories\. This contraction likely reflects the sparsity of elderly transition observations in the training data\.
Despite the substantially larger training corpus used in the full\-population setting, the spatial fidelity gap remains unresolved\. Together, these results indicate that demographic inclusion alone is insufficient to recover minority\-group spatial mobility structure when observational coverage is sparse\.
### 4\.5LLM\-based Generation Results
We repeat the synthetic trajectory experiment using Qwen3\-4B with QLoRA fine\-tuning\. Three adapters \(ℒelderly\\mathcal\{L\}\_\{\\text\{elderly\}\},ℒyoung\\mathcal\{L\}\_\{\\text\{young\}\},ℒall\\mathcal\{L\}\_\{\\text\{all\}\}\) are trained on the same demographic splits used in the Markov experiments, with each model generating 2,007 synthetic trajectories\. Table[4](https://arxiv.org/html/2606.31207#S4.T4)summarizes the evaluation against the real elderly baseline\.
Table 4:LLM\-generated synthetic trajectories vs\. Real Elderly\. Bold indicates best among the three LLM variants\.Across the three LLM variants,ℒelderly\\mathcal\{L\}\_\{\\text\{elderly\}\}achieves the lowest errors for the spatial metrics RoG and Entropy\. However, all LLM variants exhibit substantial errors for the temporal metrics Speed and Dwell Time\. Because temporal attributes are synthesized post hoc rather than directly generated by the model, temporal patterns remain poorly aligned with the empirical elderly baseline\.
Compared with the Markov models, the LLM\-based models generally achieve lower fidelity across most metrics\.ℒelderly\\mathcal\{L\}\_\{\\text\{elderly\}\}underperformsℳelderly\\mathcal\{M\}\_\{\\text\{elderly\}\}on Step Length, Speed, Dwell Time, and Entropy\. The primary exception is RoG, where the LLM achieves a smaller error \(−\-5\.7%\) than the Markov model \(−\-22\.9%\)\. This suggests that autoregressive sequence generation partially alleviates the spatial contraction caused by sparse transition observations\. However, this improvement in spatial coverage comes at the cost of reduced behavioral regularity, as the LLM models tend to overestimate mobility diversity, particularly for Entropy\.
Overall, the results suggest that higher\-capacity sequence models do not necessarily improve minority\-group mobility fidelity when demographic training data remain limited\.
## 5Conclusions and Policy Implications
### 5\.1Conclusions
This study examined how demographic underrepresentation biases mobility models, using synthetic trajectory generation as a controlled testbed\. Elderly and young riders exhibit structurally distinct mobility patterns across spatial and temporal dimensions\. Controlled experiments show that models trained on majority\-dominated populations systematically misrepresent elderly mobility behavior, while elderly\-specific training achieves closer agreement with empirical patterns, although spatial coverage remains constrained by data sparsity\. Comparisons between Markov and LLM\-based generation further reveal that higher\-capacity models do not necessarily improve subgroup\-level fidelity under limited demographic data\.
### 5\.2Policy Implications
The results suggest that aggregate mobility metrics may mask substantial behavioral differences across demographic groups\. Models trained primarily on majority populations can misrepresent minority mobility behavior even when overall performance appears acceptable\. Incorporating demographic\-specific evaluation into mobility analysis may therefore improve the reliability of data\-driven transportation planning and policy decisions\.
The empirical mobility patterns observed among elderly riders also indicate that bike\-sharing systems may serve older populations differently from younger users\. Elderly riders exhibited more localized spatial activity, lower mobility diversity, and distinct temporal usage patterns, suggesting that age\-inclusive mobility infrastructure and improved station accessibility may help broaden participation among older adults\.
More broadly, as data\-driven mobility models become increasingly integrated into urban analytics and smart\-city systems, evaluation based solely on aggregate performance metrics may overlook systematic demographic\-specific errors\. Whether the downstream task is generation, prediction, or resource allocation, demographic\-aware validation procedures can help improve the robustness and representativeness of mobility models used in urban decision\-making\.
### 5\.3Limitations and Future Work
This study is subject to several limitations\. First, the analysis is restricted to the Jersey City Citi Bike system, and future work should evaluate whether similar subgroup\-specific mobility patterns emerge across larger and more diverse urban settings\. Second, demographic analysis is limited to age group\. Extending subgroup\-level mobility evaluation to additional demographic dimensions and downstream tasks remains an important direction for future research\. Finally, this study captures mobility patterns within a specific historical period \(2016–2020\), while urban mobility systems and demographic travel behaviors may evolve gradually over longer time horizons\. Future research should examine whether subgroup\-specific mobility disparities persist under long\-term changes in urban mobility systems\.
## References
- \[1\]G\. Agostini, E\. Pierson, and N\. Garg\(2024\)A bayesian spatial model to correct under\-reporting in urban crowdsourcing\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1)\.
- \[2\]L\. Alessandretti, P\. Sapiezynski, V\. Sekara, S\. Lehmann, and A\. Baronchelli\(2018\)The scales of human mobility\.Nature563\(7729\),pp\. 639–643\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p1.1)\.
- \[3\]H\. Alt and M\. Godau\(1995\)Computing the fréchet distance between two polygonal curves\.International Journal of Computational Geometry & Applications5\(1\-2\),pp\. 75–91\.Cited by:[§2\.2](https://arxiv.org/html/2606.31207#S2.SS2.p1.1)\.
- \[4\]C\. Cao, Y\. Li,et al\.\(2023\)Mobility trajectory generation: a survey\.Artificial Intelligence Review56,pp\. 14605–14638\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[5\]J\. Chen, H\. Wang, J\. Zhao, Y\. Li, and D\. Jin\(2023\)Differentially private trajectory generation with guaranteed utility\.InProceedings of the 40th International Conference on Machine Learning \(ICML\),Vol\.202,pp\. 4896–4911\.Cited by:[§2\.2](https://arxiv.org/html/2606.31207#S2.SS2.p2.1)\.
- \[6\]X\. Chen, P\. Zhao, and X\. Di\(2024\)Age\-friendly cities and technologies: opportunities and challenges for mobility\.Transport Reviews44\(2\),pp\. 295–318\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p2.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p2.1)\.
- \[7\]E\. Cunninghamet al\.\(2021\)Privacy\-preserving synthetic location data\.ACM Transactions on Spatial Algorithms and Systems7\(3\),pp\. 1–28\.Cited by:[§2\.2](https://arxiv.org/html/2606.31207#S2.SS2.p2.1)\.
- \[8\]Y\. de Montjoye, C\. A\. Hidalgo, M\. Verleysen, and V\. D\. Blondel\(2013\)Unique in the crowd: the privacy bounds of human mobility\.Scientific Reports3,pp\. 1376\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p2.1)\.
- \[9\]T\. Dettmers, A\. Pagnoni, A\. Holtzman, and L\. Zettlemoyer\(2023\)QLoRA: efficient finetuning of quantized language models\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.36,pp\. 9842–9863\.Cited by:[§4\.3](https://arxiv.org/html/2606.31207#S4.SS3.p1.2)\.
- \[10\]J\. Feng and M\. Dijst\(2017\)Mobility of older people: a systematic review\.Reviews in Transport Economics and Policy\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p1.1)\.
- \[11\]J\. Feng, Y\. Li, C\. Yang, F\. Zhou, and D\. Jin\(2023\)Spatial\-temporal diffusion probabilistic learning for trajectory generation\.InProceedings of the 37th AAAI Conference on Artificial Intelligence,Vol\.37,pp\. 15146–15154\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[12\]M\. C\. González, C\. A\. Hidalgo, and A\. Barabási\(2008\)Understanding individual human mobility patterns\.Nature453\(7196\),pp\. 779–782\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p1.1)\.
- \[13\]R\. Hjorthol, L\. Levin, and A\. Sirén\(2010\)Mobility in different generations of older persons: the development of daily travel in different cohorts in denmark, norway and sweden\.Journal of Transport Geography18\(5\),pp\. 624–633\.Cited by:[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p1.1)\.
- \[14\]Z\. Huang, Y\. Xu, Q\. Li, and Y\. Long\(2023\)Understanding the mobility patterns of the elderly using large\-scale mobile phone data\.Travel Behaviour and Society30,pp\. 264–276\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p2.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p1.1)\.
- \[15\]R\. Jiang, X\. Yang,et al\.\(2023\)Continuous trajectory generation based on two\-stage gan\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.37,pp\. 4374–4382\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[16\]B\. Lucas, L\. Pappalardo, G\. Barlacchi, S\. Pileggi, and F\. Simini\(2024\)Evaluating the quality of synthetically generated mobility data: a comprehensive framework\.ACM Transactions on Spatial Algorithms and Systems10\(1\),pp\. 1–28\.Cited by:[§2\.2](https://arxiv.org/html/2606.31207#S2.SS2.p1.1)\.
- \[17\]N\. Mehrabi, F\. Morstatter, N\. Saxena, K\. Lerman, and A\. Galstyan\(2022\)A survey on bias and fairness in machine learning\.ACM Computing Surveys54\(6\),pp\. Article 115\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1)\.
- \[18\]C\. Moreno, Z\. Allam, D\. Chabaud, C\. Gall, and F\. Pratlong\(2021\)Introducing the “15\-minute city”: sustainability, resilience and place identity in future post\-pandemic cities\.Smart Cities4\(1\),pp\. 93–111\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p2.1)\.
- \[19\]Motivate International Inc\.\(2024\)Citi Bike NYC System Data\.Note:[https://citibikenyc\.com/system\-data](https://citibikenyc.com/system-data)Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p4.1)\.
- \[20\]H\. Nilforoshan, W\. Lanchantin, E\. Pierson,et al\.\(2023\)Human mobility networks reveal increased segregation in large cities\.Nature624\(7992\),pp\. 586–592\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1)\.
- \[21\]L\. Pappalardo and F\. Simini\(2018\)Data\-driven generation of spatio\-temporal routines\.Nature Communications9\(1\),pp\. 1–11\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[22\]H\. Qi, Z\. Chen, Y\. Zhang, and C\. Ratti\(2024\)Unequal mobility: a data\-driven analysis of demographic disparities in urban movement patterns\.Computers, Environment and Urban Systems108,pp\. 102146\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p2.1)\.
- \[23\]Qwen Team\(2025\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§4\.3](https://arxiv.org/html/2606.31207#S4.SS3.p1.2)\.
- \[24\]J\. Rao, S\. Gao,et al\.\(2020\)LSTM\-trajgan: a deep learning approach to trajectory privacy protection\.InLeibniz International Proceedings in Informatics \(LIPIcs\),Vol\.177\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[25\]H\. Sakoe and S\. Chiba\(1978\)Dynamic programming algorithm optimization for spoken word recognition\.IEEE Transactions on Acoustics, Speech, and Signal Processing26\(1\),pp\. 43–49\.Cited by:[§2\.2](https://arxiv.org/html/2606.31207#S2.SS2.p1.1)\.
- \[26\]C\. Song, Z\. Qu, N\. Blumm, and A\. Barabási\(2010\)Limits of predictability in human mobility\.Science327\(5968\),pp\. 1018–1021\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p1.1)\.
- \[27\]Z\. Sun, H\. Wang, Y\. Li, and D\. Jin\(2024\)Demographic bias in trajectory generation: characterizing and mitigating representation gaps\.InProceedings of the ACM Web Conference \(WWW\),pp\. 3819–3830\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[28\]Y\. Van de Ven, W\. Bulten, and R\. Sips\(2023\)On the limitations of generative models for trajectory privacy\.InProceedings of the 31st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems,pp\. Article 62\.Cited by:[§2\.2](https://arxiv.org/html/2606.31207#S2.SS2.p2.1)\.
- \[29\]H\. Wang, X\. Wang, Y\. Li, Y\. Zhu, J\. Chen, and Y\. Li\(2025\)MobilityGPT: generating human mobility trajectories with large language models\.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies \(IMWUT\)9\(1\),pp\. 1–23\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[30\]S\. Wang, Z\. Bao, J\. S\. Culpepper, and G\. Cong\(2024\)Trajectory similarity metrics in the era of deep learning: a systematic review\.IEEE Transactions on Knowledge and Data Engineering36\(5\),pp\. 2234–2252\.Cited by:[§2\.2](https://arxiv.org/html/2606.31207#S2.SS2.p1.1)\.
- \[31\]Z\. Wang, J\. Li,et al\.\(2024\)Spatiotemporal\-augmented graph neural networks for human mobility simulation\.IEEE Transactions on Knowledge and Data Engineering36\(11\)\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[32\]World Health Organization\(2007\)Global age\-friendly cities: a guide\.WHO Press\.Cited by:[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p1.1)\.
- \[33\]Y\. Xu, C\. Béné, R\. Batista,et al\.\(2023\)Demographic disparities in human mobility data: a large\-scale analysis\.Nature Communications14\(1\),pp\. 4256\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p2.1)\.
- \[34\]Y\. Yan, Y\. Zhang, S\. Wang, Y\. Liu, and M\. Batty\(2024\)Algorithmic fairness in urban computing: a survey\.ACM Computing Surveys56\(4\),pp\. Article 88\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p2.1)\.
- \[35\]S\. Zhang, C\. Feng, and C\. Tan\(2023\)Activity space of older adults: a longitudinal gps study in singapore\.Computers, Environment and Urban Systems103,pp\. 101989\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p2.1),[§2\.3](https://arxiv.org/html/2606.31207#S2.SS3.p1.1)\.
- \[36\]Y\. Zheng\(2015\)Trajectory data mining: an overview\.ACM Transactions on Intelligent Systems and Technology \(TIST\)6\(3\),pp\. 1–41\.Cited by:[§1](https://arxiv.org/html/2606.31207#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p1.1)\.
- \[37\]P\. Zhou, X\. Wang, Y\. Li, and Y\. Zheng\(2025\)A survey on trajectory data generation: from classical modeling to deep generative approaches\.IEEE Transactions on Knowledge and Data Engineering37\(2\),pp\. 385–406\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.
- \[38\]Y\. Zhu, W\. Yu, H\. Wang, J\. Chen, and Y\. Li\(2023\)DiffTraj: generating gps trajectory with diffusion probabilistic model\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.36,pp\. 48044–48061\.Cited by:[§2\.1](https://arxiv.org/html/2606.31207#S2.SS1.p2.1)\.Similar Articles
When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation
This paper introduces a validation framework to evaluate whether LLM-based urban simulators reproduce empirical human mobility patterns. Using data from Paris and Shanghai, the authors find a substantial gap between plausible narratives and realistic mobility constraints, and provide open infrastructure for reproducible evaluation.
Mobility Anomaly Generation using LLM-Driven Behavior with Kinematic Constraints
Introduces a generative framework that uses LLM agents to inject behavioral anomalies into simulated trajectories and applies kinematic and map constraints to produce realistic anomalous mobility data with ground truth.
The Geography of Algorithmic Judgment: LLM Intermediaries, Place Identity, and Racial Steering in Housing Search
This paper conducts a behavioral audit of seven open-weight and closed-source LLMs across four U.S. cities, finding that racial steering in housing recommendations is an emergent behavior of the model's interpretive license, varying by user identity and city context.
Longitudinal Multimodal Sensing of Physical Activity and Well-Being in Older Adults
This paper presents a longitudinal multimodal study of 66 older adults using wearable sensing and clinical assessments to predict physical activity, sleep duration, and sleep apnea severity, finding that behavioral targets are more predictable and historical features are key predictors.
Learn to Quantify Social Interaction with Constraints for Pedestrian Walking
This paper introduces a method called 'Learn to Cluster' to quantify and interpret social interactions among pedestrians for better trajectory prediction. It uses probabilistic latent variable generative learning to cluster social interactions without labels, improving robustness for autonomous driving and social robots.