CausalPOI: Spatio-Temporal Graph-Based Causal Modeling for Cold-Start POI Check-in Forecasting

arXiv cs.LG Papers

Summary

Introduces CausalPOI, a spatio-temporal graph-based causal representation learning framework for cold-start POI check-in forecasting, which outperforms state-of-the-art baselines on real-world SafeGraph datasets.

arXiv:2606.05413v1 Announce Type: new Abstract: As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data-driven urban planning and commercial decision-making. While recent advancements in spatio-temporal graph learning have improved POI forecasting, most methods rely on proximity-based graphs and correlation-driven modeling, which overlook the functional dependencies between POIs and fail to capture the causal effects of urban interventions. In this paper, we introduce a novel research problem -- cold-start POI check-in forecasting, which aims to predict the future check-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context. To address these challenges, we propose CausalPOI, a spatio-temporal graph-based causal representation learning framework. CausalPOI leverages Spatio-Temporal Functional Interaction Graph to model semantic and spatial relationships between POIs, and constructs structurally aligned treatment and control graphs to simulate factual and counterfactual scenarios. Extensive experiments on real-world SafeGraph datasets demonstrate that CausalPOI significantly outperforms state-of-the-art baselines across the board, validating its effectiveness in spatio-temporal forecasting, semantic interaction modeling, and causal effect estimation, providing a more interpretable and actionable foundation for urban intervention analysis. Source code is available at Github.
Original Article
View Cached Full Text

Cached at: 06/05/26, 08:10 AM

# CausalPOI: Spatio-Temporal Graph-Based Causal Modeling for Cold-Start POI Check-in Forecasting
Source: [https://arxiv.org/html/2606.05413](https://arxiv.org/html/2606.05413)
\(2026\)

###### Abstract\.

As urban environments continue to evolve rapidly, accurately modeling the dynamic behaviour of Points of Interest is essential for supporting data\-driven urban planning and commercial decision\-making\. While recent advancements in spatio\-temporal graph learning have improved POI forecasting, most methods rely on proximity\-based graphs and correlation\-driven modeling, which overlook the functional dependencies between POIs and fail to capture the causal effects of urban interventions\. In this paper, we introduce a novel research problem – cold\-start POI check\-in forecasting, which aims to predict the future check\-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context\. To address these challenges, we propose CausalPOI, a spatio\-temporal graph\-based causal representation learning framework\. CausalPOI leverages Spatio\-Temporal Functional Interaction Graph to model semantic and spatial relationships between POIs, and constructs structurally aligned treatment and control graphs to simulate factual and counterfactual scenarios\. Extensive experiments on real\-world SafeGraph datasets demonstrate that CausalPOI significantly outperforms state\-of\-the\-art baselines across the board, validating its effectiveness in spatio\-temporal forecasting, semantic interaction modeling, and causal effect estimation, providing a more interpretable and actionable foundation for urban intervention analysis\. Source code is available at Github111[https://github\.com/ZZQ\-NTU/CausalPOI](https://github.com/ZZQ-NTU/CausalPOI)\.

Cold\-Start POI check\-in forecasting, spatio\-temporal graph neural networks, causal inference, urban computing

††journalyear:2026††copyright:cc††conference:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2; August 09–13, 2026; Jeju Island, Republic of Korea††booktitle:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2 \(KDD ’26\), August 09–13, 2026, Jeju Island, Republic of Korea††doi:10\.1145/3770855\.3817641††isbn:979\-8\-4007\-2259\-2/2026/08††ccs:Information systems Geographic information systems††ccs:Computing methodologies Temporal reasoning††ccs:Computing methodologies Causal reasoning and diagnostics## 1\.Introduction

Urban environments are inherently dynamic, characterized by the continuous emergence, evolution, and disappearance of Points of Interest \(POIs\)\(Zhanget al\.,[2024](https://arxiv.org/html/2606.05413#bib.bib14)\)such as restaurants, gyms, libraries, and retail stores\. These POIs play a central role in shaping human mobility patterns, influencing economic vitality, and defining the spatial structure of cities\. Understanding and forecasting POI\-level behaviour is crucial for a wide range of applications, including commercial site selection, transportation planning, and public infrastructure deployment\. Traditional forecasting methods have predominantly relied on large\-scale spatio\-temporal data—such as aggregated mobility flows or region\-level check\-in statistics—to model urban activity patterns\. These approaches\(Liuet al\.,[2021](https://arxiv.org/html/2606.05413#bib.bib3); Yuanet al\.,[2024](https://arxiv.org/html/2606.05413#bib.bib4)\)often utilize coarse\-grained spatial features, treating urban space as a collection of homogeneous regions and relying on statistical correlations between them to predict future trends\. While such models offer useful insights at the macro level, they struggle to capture the fine\-grained dynamics and localized influence of individual POIs—such as temporal bursts in popularity, functional shifts over time, or competitive and complementary effects within dense urban clusters\.

Consider a real\-world scenario where a new gym is about to open in a busy urban neighbourhood\. Before its launch, business operators and digital service platforms seek to answer a key question: How many people are expected to check in at this new facility in the coming weeks? Accurate forecasting of a newly introduced gym’s check\-in volume is crucial for assessing membership demand, allocating fitness instructors and equipment, and planning class schedules\. Unlike conventional forecasting tasks\(Yanget al\.,[2018](https://arxiv.org/html/2606.05413#bib.bib25); Hajisafiet al\.,[2023](https://arxiv.org/html/2606.05413#bib.bib27); Liet al\.,[2024](https://arxiv.org/html/2606.05413#bib.bib26)\)that rely on a POI’s historical data, this setting involves predicting user activity for a cold\-start POI—one with no prior behavioural observations\. To make reliable predictions, the model must infer potential demand as a result of introducing the new gym into an existing urban ecosystem\. This requires modeling not only spatial context and functional relationships with nearby POIs, but also understanding how the new POI might causally affect and be affected by its surrounding environment\. For example, the presence of competitive gyms may negatively impact the newly added establishment’s popularity, while complementary businesses like smoothie bars may boost foot traffic\. Such counterfactual reasoning is essential to distinguish true effects from spurious correlations, and cannot be fully captured by conventional predictive models, since they primarily focus on correlation rather than causality\.

Recent studies\(Yuet al\.,[2017](https://arxiv.org/html/2606.05413#bib.bib21); Guoet al\.,[2019](https://arxiv.org/html/2606.05413#bib.bib22); Wuet al\.,[2020](https://arxiv.org/html/2606.05413#bib.bib23); Zhanget al\.,[2026](https://arxiv.org/html/2606.05413#bib.bib32)\)have increasingly focused on modeling urban activity patterns using spatio\-temporal data and graph\-based approaches\. In particular, Graph Neural Networks \(GNNs\)\(Brodyet al\.,[2021](https://arxiv.org/html/2606.05413#bib.bib29)\)have been widely adopted to capture spatial correlations and relational dependencies among POIs, often by constructing graphs based on geographic proximity\. These methods have shown success in forecasting POI activity under settings where historical data is available and the urban topology remains relatively static\. Additionally, early efforts in causal representation learning have explored the estimation of individual treatment effects \(ITE\) under the potential outcomes framework\(Liuet al\.,[2020](https://arxiv.org/html/2606.05413#bib.bib24); Wanget al\.,[2022](https://arxiv.org/html/2606.05413#bib.bib18)\), offering tools to reason about interventions from observational data\.

Despite these advances, two fundamental challenges remain in modeling POI dynamics for decision\-making: \(1\)Functional Interaction Modeling\.Existing graph\-based forecasting methods mainly rely on spatial proximity to define relationships, overlooking the semantic functionality and real\-world interactions among POIs\. This simplification limits the ability to model competition and complementarity, which are essential for accurate predictions and planning\. \(2\)Causal Estimation in Structured Spaces\.Although some methods attempt to estimate treatment effects, they often overlook the spatial structure of urban environments and rely on unstructured representations\. Consequently, they fail to simulate how interventions, such as introducing a new POI, affect surrounding areas through spatial dependencies\.

This paper introduces a novel research problem – how to model the causal effect of a newly introduced POI on urban mobility patterns by explicitly representing its temporal evolution and functional interaction with surrounding POIs, which can be formulated as a cold\-start POI check\-in forecasting problem\. Different from prior region\-level flow prediction or standard graph forecasting tasks, our setting focuses on POI\-level intervention\-aware forecasting under localized spatial interference, where the introduction of a new POI changes the local functional interaction context\. To address the aforementioned challenges, we propose CausalPOI, a spatio\-temporal graph\-based representation learning framework for POI\-level forecasting\. CausalPOI integrates two key components: the Spatio\-Temporal Functional Interaction Graph \(ST\-FIG\) Module, which captures both spatial proximity and semantic functional relations among POIs, and the Causal Inference Module, which estimates the individual treatment effect \(ITE\) of newly introduced POIs by simulating counterfactual scenarios\. In ST\-FIG, interaction strengths are learned via contrastive pretraining, allowing the model to uncover latent semantic dependencies beyond physical distance\. The causal module constructs structurally aligned treatment and control graphs and estimates ITE through a shared Graph Neural Network \(GNN\) encoder and temporal decoder\. The key contributions of this paper are summarized as follows:

- •We define a novel research problem: cold\-start POI check\-in forecasting, which aims to predict the future check\-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context\. This cold\-start forecasting setting is critical for real\-world decision\-making, yet remains underexplored in existing literature\.
- •We propose CausalPOI, a spatio\-temporal causal representation learning framework tailored to this task\. To the best of our knowledge, this is the first framework that systematically incorporates causal estimation at the POI level by simulating both factual and counterfactual outcomes, enabling fine\-grained causal reasoning in urban analysis\. CausalPOI comprises two key components: \(1\) Spatio\-Temporal Functional Interaction Graph Module, which captures both spatial proximity and semantic functional relationships among POIs, allowing the model to uncover both competitive and complementary relationships that extend beyond mere geographic distance\. \(2\) Causal Inference Module, which estimates the individual treatment effect of newly introduced POIs based on aligned treatment and control graph structures\.
- •We conduct extensive experiments on cold\-start POI check\-in forecasting using real\-world POI and check\-in data from the US\. The results demonstrate the effectiveness of CausalPOI in predicting post\-introduction POI activity and estimating treatment effects\. Compared to baselines, our method achieves notable improvements across all metrics\. In particular, CausalPOI achieves up to57\.8%RMSE and34\.3%MAE reduction compared to the best\-performing baseline on the most challenging region, demonstrating its robustness under cold\-start conditions and its effectiveness in capturing causal effects in dynamic urban environments\.

## 2\.Related Work

### 2\.1\.POI Check\-in Prediction

POI check\-in prediction is commonly framed as a downstream task of urban representation learning, where the central goal is to derive informative region\-level embeddings\. In this line of research, spatial graphs are typically constructed using POIs, trajectories, or road networks, and the resulting embeddings are employed for tasks such as demand forecasting or mobility analysis\. For example, Fu et al\.\(Fuet al\.,[2019](https://arxiv.org/html/2606.05413#bib.bib12)\)and Zhang et al\.\(Zhanget al\.,[2019](https://arxiv.org/html/2606.05413#bib.bib13)\)model spatial autocorrelations and intra\-region structures to support region\-level check\-in forecasting\. More recent approaches, including Wu et al\.\(Wuet al\.,[2022](https://arxiv.org/html/2606.05413#bib.bib11)\)and Zhang et al\.\(Zhanget al\.,[2023](https://arxiv.org/html/2606.05413#bib.bib10)\), further enhance regional representations by incorporating heterogeneous spatial signals through graph\-based fusion or contrastive learning techniques\. Li et al\.\(Liet al\.,[2020](https://arxiv.org/html/2606.05413#bib.bib2)\)also investigate POI interactions by analyzing competitive relationships, highlighting the importance of functional dependencies in POI behaviour modeling\.

However, these methods primarily focus on regional abstractions, which may overlook the nuanced dynamics of individual POIs\. By formulating check\-in prediction as a statistical aggregation at the regional level, they often fail to capture the evolving patterns of specific POIs—such as temporal bursts, localized trends, or functional shifts\. A few studies attempt to directly model POIs\. For instance, Tschernutter et al\.\(Tschernutter and Feuerriegel,[2021](https://arxiv.org/html/2606.05413#bib.bib9)\)predict POI\-level check\-in volumes by modeling latent interactions among POIs\. Nonetheless, these models generally assume static POI behaviour and lack the ability to capture temporal evolution or structural changes in local contexts\.

### 2\.2\.Causal Modeling

Conventional POI forecasting models predominantly rely on statistical correlations observed in spatio\-temporal data, limiting their capacity to uncover causal effects—especially in counterfactual scenarios, such as evaluating the impact of adding or removing a POI\. To overcome these limitations, causal representation learning has recently garnered attention\. Pioneering works such as TARNet and CFRNet\(Shalitet al\.,[2017](https://arxiv.org/html/2606.05413#bib.bib1)\), and DragonNet\(Shiet al\.,[2019](https://arxiv.org/html/2606.05413#bib.bib17)\), leverage the potential outcomes framework to estimate ITE by learning representations that reduce confounding bias\. However, these methods are primarily designed for tabular or sequential data and are not well\-suited for structured spatial domains\.

Meanwhile, graph\-based causal inference has emerged as a promising direction for relational data\. For example, CausalGNN\(Wanget al\.,[2022](https://arxiv.org/html/2606.05413#bib.bib18)\)proposes to learn causal representations on static graphs by integrating intervention modeling with message passing, while HyperSCI\(Maet al\.,[2022](https://arxiv.org/html/2606.05413#bib.bib28)\)extends this line of work to higher\-order relational structures\. Yet, such methods often assume fixed graph topologies and lack mechanisms to handle dynamic spatial structures, which are common in urban environments\. Additionally, they typically neglect the temporal evolution of spatial dependencies and the functional semantics of urban entities like POIs\.

## 3\.Preliminary

Definition 1Point of Interest: A POIpprefers to a point with a geographic position,p\.g=\(l​a​tp,l​o​np\)p\.g=\(lat\_\{p\},lon\_\{p\}\)\. It can be associated with a set of textual tagsp\.t=\(t1,t2,…,ti\)p\.t=\(t\_\{1\},t\_\{2\},\.\.\.,t\_\{i\}\)\. Definition 2POI Check\-in sequence: Each POIpphas a check\-in sequencep\.c=\{c1,c2,…,cW\}p\.c=\\\{c\_\{1\},c\_\{2\},\.\.\.,c\_\{W\}\\\}, and each check\-in record is denoted as a tuplec=\(w,n\)c=\(w,n\)indicating that the POIppwas visitednntimes in the weekw\\mathrm\{w\}\. Examples of POI and check\-in sequence are provided in Table[4](https://arxiv.org/html/2606.05413#A1.T4)in Appendix[A](https://arxiv.org/html/2606.05413#A1)\. Problem Statement: We formally define theCold\-Start POI Check\-in Forecastingproblem as follows: Given a newly introduced POIppwith no historical check\-in records, the objective is to \(1\) forecast its future check\-in sequencep^\.c=\{c^1,c^2,…,c^W\}\\hat\{p\}\.c=\\\{\\hat\{c\}\_\{1\},\\hat\{c\}\_\{2\},\\dots,\\hat\{c\}\_\{\\mathrm\{W\}\}\\\}over the nextW\\mathrm\{W\}weeks; and \(2\) estimate the individual treatment effect \(ITE\) of introducingppby comparing the predicted outcomes under the factual scenario \(withppintroduced\) and the counterfactual scenario \(withoutpp\)\. During training and evaluation, the ground\-truth future check\-in sequencep\.c=\{c1,c2,…,cW\}p\.c=\\\{c\_\{1\},c\_\{2\},\.\.\.,c\_\{W\}\\\}is available as supervision\.

![Refer to caption](https://arxiv.org/html/2606.05413v1/x1.png)Figure 1\.An overview structure of CausalPOI, which consists of two main components: \(i\)ST\-FIG Modulewhere a Spatio\-Temporal Functional Interaction Graph is built for each week to capture spatial and semantic relations among POIs; and \(ii\)Causal Inference Modulewhere treatment and control graphs are encoded using shared Graph and Position Encoders, and their temporal evolution is modeled using a GRU to estimate the causal effect of the newly introduced POI\.
## 4\.Methodology

We model thecold\-start POI check\-in forecastingproblem as a time series problem, which can better capture the trends and dynamic behaviours of POIs over time\. Time series modeling accounts for temporal dependencies, identifying patterns in the evolution of POIs, and making more accurate predictions\. To this end, we propose CausalPOI, a spatial\-temporal graph\-based causality\-aware representation learning framework tailored to estimate the Individual Treatment Effect of newly introduced POIs\. Unlike traditional approaches that rely on region\-level aggregation, CausalPOI focuses on POI\-level time series, enabling the model to recognize fine\-grained behavioural patterns specific to different time periods\. It explicitly models the counterfactual scenario in which the POI had not been introduced, allowing the framework to disentangle causal influence from mere correlation\. Although our implementation builds upon standard graph representation learning components, the key novelty lies in the intervention\-aware problem formulation and the causal graph construction for POI\-level forecasting\. Specifically, we construct paired treatment\-control graphs for each newly introduced POI, enabling localized counterfactual reasoning beyond standard static graph prediction\. ST\-FIG further captures functional relations among neighbouring POIs rather than relying solely on geographic proximity\. As illustrated in Figure[1](https://arxiv.org/html/2606.05413#S3.F1), this end\-to\-end design encourages the learning of spatio\-temporal representations that are both temporally consistent and causally informative, thus enabling reliable estimation of the evolving impact of new POIs\.

### 4\.1\.Spatial\-Temporal Functional Interaction Graph Module

The neighbour\-based check\-in volume captures activity driven by the presence and interaction of nearby POIs, particularly through competitive or complementary relationships\. For instance, a gym may boost visits to a nearby smoothie bar due to their functional complementarity, while potentially experiencing decreased check\-ins from the emergence of a competing gym nearby\. This component reflects the spatial and functional interplay among neighbouring POIs and its influence on individual check\-in dynamics\. To model these intricate interactions, we construct theSpatial\-Temporal Functional Interaction Graph, which captures the evolving spatial, temporal, and functional dependencies among POIs\. In this graph, each POI is represented as a node, and edges are established based on spatial proximity, temporal co\-occurrence, and functional relationships such as competition and complementarity\. This comprehensive structure enables the model to learn how nearby POIs influence each other over time, supporting more accurate and context\-aware check\-in prediction\.

Given a newly added POIpp, we collect a neighbourhood setN=\{n1,…,ni\}N=\\\{n\_\{1\},\.\.\.,n\_\{i\}\\\}comprising nearby POIs that may exhibit competitive or complementary effects due to geographic proximity, as determined by the following condition:

\(1\)d​i​s​t\(p,n\)=H​a​v​e​r​s​i​n​e​\(l​a​tp,l​a​tn,l​o​np,l​o​nn,r\)≤m​a​xd​i​s​t\\small dist\_\{\(p,n\)\}=Haversine\(lat\_\{p\},lat\_\{n\},lon\_\{p\},lon\_\{n\},r\)\\leq max\_\{dist\}where theH​a​v​e​r​s​i​n​eHaversineformula calculates the great\-circle distance between two points given their latitudes and longitudes,rris the radius of the Earth, andm​a​xd​i​s​tmax\_\{dist\}is set to 1000m\.

We then construct a POI functional interaction graph𝒢\\mathcal\{G\}, which is constructed as a 1\-hop subgraph with the following principles:

- •Node Definition: Each subgraph𝒢\\mathcal\{G\}comprises a POI\-type target POI and its spatial neighbours, represented as neighbour\-type nodes\. Each node is initialized with textual embeddings derived from POI textual tags;
- •Edge Definition: For each neighbour POIn∈Nn\\in N, we construct a directed edge from the neighbour POInnto the target POIpp, enabling bidirectional message passing\. Each edge is assigned a weightw\(p,n\)w\_\{\(p,n\)\};
- •Temporal Modeling: For each newly added POI, we construct four temporal subgraphs corresponding to the firstw\\mathrm\{w\}weeks following its introduction\. This design enables the model to capture early\-stage interaction patterns and temporal dynamics of spatial influence\. The temporal dynamics in ST\-FIG are induced by the week\-specific neighbourhood composition and activity context\. Specifically, only POIs that are active in weekw\\mathrm\{w\}are included in𝒢w\\mathcal\{G\}\_\{\\mathrm\{w\}\}, causing the node set and aggregated neighborhood representations to evolve over time\.

The textual embeddingEpE\_\{p\}of the POIppis obtained by encoding its textual tags using pre\-trained BERT\(Devlinet al\.,[2018](https://arxiv.org/html/2606.05413#bib.bib15)\):

\(2\)Ep=BERT​\(\[CLS\]​t1,t2,…,ti​\[SEP\]\)E\_\{p\}=\\text\{BERT\}\\ \(\\ \\text\{\[CLS\]\}\\ t\_\{1\},t\_\{2\},\.\.\.,t\_\{i\}\\ \\text\{\[SEP\]\}\\ \)where\[CLS\]represents the entire input sequence, while\[SEP\]is used to separate different segments\.

To define the edge weightw\(p,n\)\\mathrm\{w\}\_\{\(p,n\)\}between a newly added POIppand its neighbournn, we jointly consider their geographic proximity and functional relationship\. Specifically, the weight is computed as:

\(3\)w\(p,n\)=α\(p,n\)⋅exp⁡\(−d​i​s​t\(p,n\)22​σ2\)w\_\{\(p,n\)\}=\\alpha\_\{\(p,n\)\}\\cdot\\exp\\left\(\-\\frac\{dist\_\{\(p,n\)\}^\{2\}\}\{2\\sigma^\{2\}\}\\right\)whereσ\(=0\.5\)\\sigma\(=0\.5\)is a bandwidth parameter controlling the spatial decay,α\(p,n\)\\alpha\_\{\(p,n\)\}quantifies the functional interaction strength betweenppandnn\. To estimateα\(p,n\)\\alpha\_\{\(p,n\)\}, we pretrain a POI category encoder using contrastive learning to capture task\-specific functional semantics\. Specifically, we treat functionally competitive POIs as positive pairs, under the assumption that POIs that share similar categories exhibit stronger competition\. In contrast, functionally complementary POIs are considered negative pairs, reflecting weaker semantic alignment and potential cooperation\. Each POI category is mapped to a textual description and encoded ultilizing BERT\(Devlinet al\.,[2018](https://arxiv.org/html/2606.05413#bib.bib15)\), trained with the information noise contrastive estimation \(InfoNCE\)\(Oordet al\.,[2018](https://arxiv.org/html/2606.05413#bib.bib16)\)loss:

\(4\)ℒC​L=−1B​∑i=1Blog⁡exp⁡\(sim​\(𝐄i,𝐄i\+\)/𝓉\)∑j=1Bexp⁡\(sim​\(𝐄i,𝐄j\+\)/𝓉\)\\mathcal\{L\}\_\{CL\}=\-\\frac\{1\}\{B\}\\sum\_\{i=1\}^\{B\}\\log\\frac\{\\exp\(\\text\{sim\}\(\\mathbf\{E\}\_\{i\},\\mathbf\{E\}\_\{i\}^\{\+\}\)/\\mathcal\{t\}\)\}\{\\sum\_\{j=1\}^\{B\}\\exp\(\\text\{sim\}\(\\mathbf\{E\}\_\{i\},\\mathbf\{E\}\_\{j\}^\{\+\}\)/\\mathcal\{t\}\)\}whereBBis the mini\-batch size,𝓉\(=0\.5\)\\mathcal\{t\}\(=0\.5\)is a temperature hyperparameter and𝐄i\\mathbf\{E\}\_\{i\}and𝐄i\+\\mathbf\{E\}\_\{i\}^\{\+\}are the embeddings of theii\-th anchor and its corresponding positive sample, respectively\. All embeddings are derived from POI category descriptions using a pretrained BERT encoder\. For each anchor, the remaining positive samples\{𝐄j\+∣j≠i\}\\\{\\mathbf\{E\}\_\{j\}^\{\+\}\\mid j\\neq i\\\}within the batch are treated as negative samples\. After training, we define the functional dissimilarity between a newly added POIppand its neighbournnas:

\(5\)s\(p,n\)=−𝐄p⋅𝐄n\|𝐄p\|⋅\|𝐄n\|s\_\{\(p,n\)\}=\-\\frac\{\\mathbf\{E\}\_\{p\}\\cdot\\mathbf\{E\}\_\{n\}\}\{\|\\mathbf\{E\}\_\{p\}\|\\cdot\|\\mathbf\{E\}\_\{n\}\|\}where𝐄p\\mathbf\{E\}\_\{p\}and𝐄n\\mathbf\{E\}\_\{n\}denote the category embeddings of POIppand POInnobtained from the pretrained encoder, respectively\. This negated cosine similarity ensures that functionally complementary POIs—those with low semantic similarity—receive higher similarity scores, while competitive POIs—those with high semantic similarity—are assigned lower scores\. We then normalize this value to the\[0,1\]\[0,1\]range to compute the functional influence weight:

\(6\)α\(p,n\)=1\+s\(p,n\)2\\alpha\_\{\(p,n\)\}=\\frac\{1\+s\_\{\(p,n\)\}\}\{2\}

### 4\.2\.Causal Inference Module

To explicitly reason about interventions, we adopt a networked potential outcome framework with partial interference, where the exposure of each target POI depends on the configuration of its spatially adjacent POIs\. The binary treatment indicatort∈\{0,1\}t\\in\\\{0,1\\\}denotes whether a POI is newly introduced, while aggregated neighbour influences are modeled as exposure variables summarizing nearby treatments and contexts\. Within this framework, conventional predictive models—which focus mainly on correlation—often fail to disentangle causal effects, especially when assessing how a newly added POI affects and is affected by surrounding check\-in dynamics\. To bridge this gap, we introduce a dedicated Causal Inference Module, which is designed to estimate the ITE of a newly added POI by explicitly modeling the counterfactual scenario in which the POI had not been introduced\.

#### 4\.2\.1\.Causal Effect Formulation

Unlike traditional causal estimation settings, our goal is to forecast the post\-introduction check\-in dynamics of a new POI rather than infer a population\-level treatment effect\. We therefore adopt a localized interference assumption, where the influence of an added POI is constrained within its 1\-hop spatial neighbourhood, and interference beyond this range is assumed negligible\. This allows us to approximate the counterfactual outcome \(absence of the POI\) by constructing a control graph with preserved topology but masked semantics\. Conditional on spatial and semantic covariates captured by the ST\-FIG, the model learns to distinguish true causal influence from spurious correlations, yielding identifiable and interpretable representations for forecasting\. We acknowledge that site selection can be endogenous because unobserved intent\-to\-open factors may influence both POI introduction and future demand\. Our framework partially mitigates this issue by conditioning on rich pre\-treatment local context and comparing treatment and control outcomes under aligned spatial topology\. Nevertheless, unobserved confounders cannot be fully captured in the observational setting, and thus the estimated causal effects should be interpreted as localized counterfactual estimates rather than fully randomized intervention effects\.

Concretely, the model generates two parallel predictions for a newly introduced POIpp:y^1\\hat\{y\}\_\{1\}under the treatment condition, where the semantic influence ofppis present, andy^0\\hat\{y\}\_\{0\}under the control condition, where such influence is removed while preserving the same spatial topology\. The contrast between these two predictions provides a counterfactual perspective on the functional impact ofpp, without explicitly optimizing individual treatment effects as supervised targets\.

To achieve unbiased estimation, the treatment propensity score is defined solely over pre\-treatment covariates\. We extract a shared pre\-treatment embedding𝐄0\\mathbf\{E\}\_\{0\}from the ST\-FIG encoder to represent the contextual environment ofpp:

\(7\)𝐄0=LayerNorm​\(concat​\[𝐡p,𝐡𝒩,𝐡ppos\]\)\\mathbf\{E\}\_\{0\}=\\text\{LayerNorm\}\\left\(\\text\{concat\}\\left\[\\mathbf\{h\}\_\{p\}\\ ,\\mathbf\{h\}\_\{\\mathcal\{N\}\}\\ ,\\mathbf\{h\}\_\{p\}^\{\\text\{pos\}\}\\right\]\\right\)where𝐡p\\mathbf\{h\}\_\{p\}denotes a zero embedding of the target POI,𝐡𝒩\\mathbf\{h\}\_\{\\mathcal\{N\}\}is the mean\-pooled representation of its neighbour POIs and𝐡ppos\\mathbf\{h\}\_\{p\}^\{\\text\{pos\}\}is the positional encoding of locationpp\(defined later in Section[4\.2\.3](https://arxiv.org/html/2606.05413#S4.SS2.SSS3)\)\.

To estimate the treatment propensity score, we apply a linear classifier over the shared representation obtained from the temporal graph encoder:

\(8\)p^=δ​\(𝒲p⋅𝐄0\)\\hat\{p\}=\\delta\\left\(\\mathcal\{W\}\_\{p\}\\cdot\\mathbf\{E\}\_\{0\}\\right\)where𝒲p\\mathcal\{W\}\_\{p\}is a learnable projection andδ​\(⋅\)\\delta\(\\cdot\)denotes the sigmoid activation\.

#### 4\.2\.2\.Treatment and Control Graph Construction

To support this estimation, we construct a pair of structurally aligned spatio\-temporal graphs for each POI:

- •Treatment Graph \(𝒢\(1\)\\mathcal\{G\}^\{\(1\)\}\): A 1\-hop subgraph centered around the target POIpp, incorporating its spatial neighbours\. Each node is initialized with its semantic representation\.
- •Control Graph \(𝒢\(0\)\\mathcal\{G\}^\{\(0\)\}\): Shares the same topology as𝒢\(1\)\\mathcal\{G\}^\{\(1\)\}, but replaces the embedding of the target POI with a zero vector, effectively simulating the absence ofppwithout altering the graph structure\.

We adopt this feature\-masking strategy instead of physically removing the target POI node\. Although zero\-feature masking is not strictly equivalent to physically removing the target POI, it provides a structure\-preserving approximation of the counterfactual scenario by suppressing the semantic influence of the target POI while keeping the local topology aligned between treatment and control graphs\. In contrast, directly removing the target node would alter the neighbourhood structure, eliminate its incident edges in the 1\-hop subgraph, and introduce structural mismatch\. As a result, the treatment\-control difference could partly reflect topology changes rather than the semantic effect of the newly introduced POI\. By preserving the graph topology, our design maintains structural consistency and enables more reliable counterfactual estimation\.

To mitigate potential spillover effects while maintaining comparable relational structure, the control graph keeps neighbour embeddings unchanged and preserves spatial connectivity, but removes the functional interaction termα\(p,n\)\\alpha\_\{\(p,n\)\}in its edge computation:

\(9\)w\(p,n\)\(0\)=exp⁡\(−d​i​s​t\(p,n\)22​σ2\)w^\{\(0\)\}\_\{\(p,n\)\}=\\exp\\\!\\left\(\-\\frac\{dist^\{2\}\_\{\(p,n\)\}\}\{2\\sigma^\{2\}\}\\right\)This design preserves the geographic topology while nullifying semantic effects, thus representing a counterfactual scenario where the target POI exerts no functional impact on its neighbours\. For numerical stability, all adjacency weights are locally normalized such that∑n∈N​\(p\)w~​\(p,n\)=1\\sum\_\{n\\in N\(p\)\}\\tilde\{w\}\(p,n\)=1, and the normalizedw~​\(p,n\)\\tilde\{w\}\(p,n\)is used as the attention bias in the GATv2 encoder for both graphs\.

#### 4\.2\.3\.Feature Extraction

To capture both spatial and temporal context, we construct a sequence of weekly graphs overW\\mathrm\{W\}consecutive weeks\. Each weekly graph𝒢w\(t\)\\mathcal\{G\}^\{\(t\)\}\_\{w\}represents the 1\-hop neighbourhood of POIppat weekw\\mathrm\{w\}under treatment conditiont∈\{0,1\}t\\in\\\{0,1\\\}\. Unlike static spatial graphs, both the node features and neighbourhood sets are time\-varying, reflecting changes in active POIs and week\-specific check\-in dynamics\. The neighbour set𝒩w​\(p\)\\mathcal\{N\}\_\{\\mathrm\{w\}\}\(p\)is dynamically updated to include only POIs that remain active during weekw\\mathrm\{w\}, allowing local connectivity to evolve over time\.

We use GATv2\(Brodyet al\.,[2021](https://arxiv.org/html/2606.05413#bib.bib29)\)as the shared graph encoder for each weekly graph\. This choice is motivated by the localized 1\-hop structure of ST\-FIG, where the key challenge is not deep multi\-hop propagation, but edge\-aware relation modeling between the target POI and its heterogeneous neighbours\. Unlike Transformer or Set Attention, which mainly treat neighbouring POIs as unordered tokens and rely on content\-based attention, GATv2 naturally supports message passing with edge attributes\. Therefore, we incorporate the scalar edge weight as an edge attribute to directly influence the attention computation, serving as a scaling factor that adjusts the importance of neighbour messages according to their spatial\-functional affinities:

\(10\)𝐇w\(t\)=GATv2​\(𝒢w\(t\)\)∈ℝ\(N\+1\)×4​d\\mathbf\{H\}^\{\(t\)\}\_\{w\}=\\text\{GATv2\}\\ \(\\mathcal\{G\}^\{\(t\)\}\_\{w\}\)\\in\\mathbb\{R\}^\{\(N\+1\)\\times 4d\}whereNNdenotes the number of neighbour nodes, including the target POI and its spatial neighbours, andd\(=128\)d\\ \(=128\)is the embedding dimension\. From this matrix, we extract the embedding of the central node𝐡p,w\(t\)\\mathbf\{h\}\_\{p,\\mathrm\{w\}\}^\{\(t\)\}, corresponding to the target POIpp, and compute the average embedding of its neighbours:

\(11\)𝐡𝒩,w\(t\)=1\|N\|​∑n∈N𝐡n,w\(t\)\\mathbf\{h\}\_\{\\mathcal\{N\},\\mathrm\{w\}\}^\{\(t\)\}=\\frac\{1\}\{\|N\|\}\\sum\_\{n\\in N\}\\mathbf\{h\}\_\{n,\\mathrm\{w\}\}^\{\(t\)\}To incorporate geographic information, we encode the latitude and longitude of each POI using a sinusoidal position encoding scheme that integrates multiple frequency components across dimensions, enabling the representation to capture spatial information at varying levels of granularity\. Specifically, for each coordinatep\.g∈\{latp,lonp\}p\.g\\in\\\{\\text\{lat\}\_\{p\},\\text\{lon\}\_\{p\}\\\}, theii\-th dimension of its encoding is defined as:

\(12\)PE\(i\)\(p\.g\)=\{sin⁡\(λ⋅pc⋅10000−2​kd\)if​i=2​kcos⁡\(λ⋅pc⋅10000−2​kd\)if​i=2​k\+1∀pc∈p\.g\\text\{PE\}^\{\(i\)\}\(p\.g\)=\\begin\{cases\}\\sin\\left\(\\lambda\\cdot p\_\{c\}\\cdot 10000^\{\-\\frac\{2k\}\{d\}\}\\right\)&\\text\{if \}i=2k\\\\\[6\.0pt\] \\cos\\left\(\\lambda\\cdot p\_\{c\}\\cdot 10000^\{\-\\frac\{2k\}\{d\}\}\\right\)&\\text\{if \}i=2k\+1\\end\{cases\}\\quad\\forall\\,p\_\{c\}\\in p\.gwhered\(=128\)d\\ \(=128\)denotes the dimensionality of the positional encoding, andλ\(=100\)\\lambda\\ \(=100\)is a scaling factor designed to improve the sensitivity to subtle spatial variations among POIs based on their geographic coordinates\. The encodings of the latitude and longitude are concatenated to form the final positional embedding𝐡pos\\mathbf\{h\}\_\{\\text\{pos\}\}:

\(13\)𝐡ppos=Concat​\(PE​\(latp\),PE​\(lonp\)\)∈ℝ2​d\\mathbf\{h\}\_\{p\}^\{\\text\{pos\}\}=\\text\{Concat\}\\left\(\\text\{PE\}\(\\text\{lat\}\_\{p\}\),\\ \\text\{PE\}\(\\text\{lon\}\_\{p\}\)\\right\)\\in\\mathbb\{R\}^\{2d\}These three components are concatenated and passed through a LayerNorm layer to yield a unified representation for weekw\\mathrm\{w\}:

\(14\)𝐄w\(t\)=LayerNorm​\(concat​\[𝐡p,w\(t\),𝐡𝒩,w\(t\),𝐡ppos\]\)\\mathbf\{E\}\_\{\\mathrm\{w\}\}^\{\(t\)\}=\\text\{LayerNorm\}\\left\(\\text\{concat\}\\left\[\\mathbf\{h\}\_\{p,\\mathrm\{w\}\}^\{\(t\)\}\\ ,\\mathbf\{h\}\_\{\\mathcal\{N\},\\mathrm\{w\}\}^\{\(t\)\}\\ ,\\mathbf\{h\}\_\{p\}^\{\\text\{pos\}\}\\right\]\\right\)The vectors is then fed into a GRU to capture temporal dynamics:

\(15\)𝐄\(t\)=GRU​\(\[𝐄1\(t\),…,𝐄W\(t\)\]\)\\mathbf\{E\}^\{\(t\)\}=\\text\{GRU\}\(\[\\mathbf\{E\}\_\{1\}^\{\(t\)\},\\dots,\\mathbf\{E\}\_\{\\mathrm\{W\}\}^\{\(t\)\}\]\)This final representation𝐄\(t\)\\mathbf\{E\}^\{\(t\)\}is then used to predict counterfactual check\-in sequences under both treatment and control conditions via the causal prediction module\.

#### 4\.2\.4\.Causal Prediction

The temporally aggregated representation𝐄\(t\)\\mathbf\{E\}^\{\(t\)\}encodes both the evolving local interaction context and the geographic characteristics of POIppunder treatment conditiontt\. To enable counterfactual outcome modeling, we adopt a treatment\-aware prediction head that maps𝐄\(t\)\\mathbf\{E\}^\{\(t\)\}to a sequence of weekly check\-in predictions\. This design allows the model to generate both factual and counterfactual outcome trajectories while maintaining structural alignment between treatment and control representations\. Specifically,𝐄\(t\)\\mathbf\{E\}^\{\(t\)\}is fed into two outcome decoders corresponding to treatment and control conditions, respectively:

\(16\)𝐲^t=ft​\(𝐄\(t\)\)∈ℝW,t∈\{0,1\}\\hat\{\\mathbf\{y\}\}\_\{t\}=f\_\{t\}\(\\mathbf\{E\}^\{\(t\)\}\)\\in\\mathbb\{R\}^\{W\},\\quad t\\in\\\{0,1\\\}wheref1​\(⋅\)f\_\{1\}\(\\cdot\)andf0​\(⋅\)f\_\{0\}\(\\cdot\)are parameter\-shared temporal decoders with treatment\-specific output heads\.

### 4\.3\.Training Loss

The model is trained using the following temporally extended loss:

\(17\)ℒT​C=1n∑i=1n\[\\displaystyle\\mathcal\{L\}\_\{TC\}=\\frac\{1\}\{n\}\\sum\_\{i=1\}^\{n\}\\Bigg\[1W​∑w=1W\(y^t,w\(i\)−yw\(i\)\)2\\displaystyle\\frac\{1\}\{\\mathrm\{W\}\}\\sum\_\{w=1\}^\{\\mathrm\{W\}\}\\left\(\\hat\{y\}^\{\(i\)\}\_\{t,\\mathrm\{w\}\}\-y^\{\(i\)\}\_\{\\mathrm\{w\}\}\\right\)^\{2\}\+γ⋅CrossEntropy\(p^\(i\),t\(i\)\)\],\\displaystyle\+\\gamma\\cdot\\text\{CrossEntropy\}\\left\(\\hat\{p\}^\{\(i\)\},t^\{\(i\)\}\\right\)\\Bigg\],\(18\)y^t,w\(i\)=t\(i\)⋅y^1,w\(i\)\+\(1−t\(i\)\)⋅y^0,w\(i\)\\hat\{y\}\_\{t,\\mathrm\{w\}\}^\{\(i\)\}=t^\{\(i\)\}\\cdot\\hat\{y\}\_\{1,\\mathrm\{w\}\}^\{\(i\)\}\+\(1\-t^\{\(i\)\}\)\\cdot\\hat\{y\}\_\{0,\\mathrm\{w\}\}^\{\(i\)\}wherey^t,w\(i\)\\hat\{y\}\_\{t,\\mathrm\{w\}\}^\{\(i\)\}is the predicted outcome at weekw\\mathrm\{w\}under the observed treatmentt\(i\)∈\{0,1\}t^\{\(i\)\}\\in\\\{0,1\\\}, andγ\(=0\.2\)\\gamma\(=0\.2\)is a hyperparameter that balances the prediction and treatment components\.

## 5\.Experiments

To evaluate the effectiveness of CausalPOI, we conduct experiments on four real\-world datasets and compare it with established baseline methods\. We further perform an ablation study to analyze the contribution of each key component to the overall performance gains\. In addition, we conduct parameter sensitivity analysis to examine the robustness of CausalPOI under different hyperparameter settings\. Finally, we validate the reliability of the estimated causal effects through an uplift sanity check and a placebo test, which help verify whether the learned effects are meaningful\.

### 5\.1\.Experimental Setups

#### 5\.1\.1\.Datasets

In our data collection process, we download POI data and user check\-in data from SafeGraph222[https://www\.safegraph\.com](https://www.safegraph.com/)\. We divide the dataset into four regions, Northeast, Midwest, South, and West, following the American official region segmentation\. The data covers the period from Sep 2018 to Jan 2020, which corresponds to exactly 74 consecutive weeks\. As part of our experimental design, we define newly added POIs as those that emerged within the observation window and persisted for at least 40 weeks\. We focus on predicting the weekly check\-in volumes for the first four weeks after each new POI’s introduction\. The statistics are summarized in Table[1](https://arxiv.org/html/2606.05413#S5.T1)\.

Table 1\.Statistics of datasetsRegionCensusPOIAdded POINortheast42,438998,37916,959Midwest52,8941,209,42719,116South75,4692,156,09131,894West46,9381,453,06221,611

Table 2\.Estimation and ablation study of cold\-start POI check\-in forecasting\. We evaluate model performance using Root Mean Squared Error \(RMSE\) and Mean Absolute Error \(MAE\)\. The row highlighted in bold represents the performance of CausalPOI, while the rows above it show the baseline models\. Based on their underlying approach, we categorize the baselines into six types: time\-series GNN\-based \(G\), node\-update\-based \(N\), statistical\-based \(S\), LLM\-based \(L\), causal\-based \(C\) and generative synthesis\-based \(D\)\. The lower part of the table presents the results of the ablation study\. To ensure robustness, all metrics are reported as the mean and standard deviation over ten independent runs\.TypeModelNortheastMidwestSouthWestRMSE↓\\downarrowMAE↓\\downarrowRMSE↓\\downarrowMAE↓\\downarrowRMSE↓\\downarrowMAE↓\\downarrowRMSE↓\\downarrowMAE↓\\downarrowGDCRNN19\.86\(± 0\.36\)4\.65\(± 0\.26\)22\.40\(± 0\.40\)5\.18\(± 0\.30\)23\.35\(± 0\.38\)5\.39\(± 0\.28\)24\.52\(± 0\.35\)5\.56\(± 0\.27\)GGraphWaveNet18\.62\(± 0\.32\)4\.41\(± 0\.24\)20\.87\(± 0\.36\)4\.93\(± 0\.27\)21\.83\(± 0\.35\)5\.16\(± 0\.26\)23\.07\(± 0\.33\)5\.31\(± 0\.25\)GAGCRN17\.10\(± 0\.28\)4\.08\(± 0\.22\)19\.25\(± 0\.31\)4\.59\(± 0\.24\)20\.14\(± 0\.30\)4\.81\(± 0\.23\)21\.36\(± 0\.29\)4\.97\(± 0\.23\)GLightST15\.72\(± 0\.30\)3\.58\(± 0\.18\)17\.66\(± 0\.32\)4\.02\(± 0\.19\)18\.48\(± 0\.38\)4\.25\(± 0\.23\)19\.61\(± 0\.33\)4\.38\(± 0\.26\)DTrafficStream14\.96\(± 0\.27\)3\.46\(± 0\.17\)16\.98\(± 0\.30\)3\.90\(± 0\.18\)17\.84\(± 0\.33\)4\.12\(± 0\.21\)18\.97\(± 0\.31\)4\.26\(± 0\.20\)SSVGP15\.40\(± 0\.05\)3\.25\(± 0\.13\)18\.30\(± 0\.04\)4\.20\(± 0\.14\)20\.80\(± 0\.04\)4\.48\(± 0\.16\)22\.30\(± 0\.03\)4\.79\(± 0\.15\)SLCFM12\.82\(± 0\.03\)2\.90\(± 0\.14\)15\.88\(± 0\.01\)3\.77\(± 0\.15\)17\.92\(± 0\.01\)3\.98\(± 0\.15\)19\.60\(± 0\.01\)4\.08\(± 0\.17\)LTIME\-LLM9\.67\(± 0\.04\)2\.08\(± 0\.11\)11\.83\(± 0\.03\)2\.81\(± 0\.11\)14\.06\(± 0\.04\)2\.96\(± 0\.11\)15\.48\(± 0\.03\)3\.12\(± 0\.12\)LTimeCMA8\.88\(± 0\.06\)1\.93\(± 0\.10\)10\.98\(± 0\.05\)2\.62\(± 0\.09\)13\.18\(± 0\.07\)2\.81\(± 0\.12\)14\.62\(± 0\.06\)2\.97\(± 0\.11\)CGCIM7\.68\(± 0\.10\)2\.05\(± 0\.08\)8\.34\(± 0\.11\)2\.21\(± 0\.09\)9\.12\(± 0\.12\)2\.29\(± 0\.09\)9\.56\(± 0\.10\)2\.36\(± 0\.10\)NKGDiff6\.74\(± 0\.09\)1\.93\(± 0\.07\)7\.29\(± 0\.10\)2\.08\(± 0\.07\)8\.01\(± 0\.11\)2\.15\(± 0\.08\)8\.43\(± 0\.09\)2\.21\(± 0\.08\)\-CausalPOI5\.36\(± 0\.55\)1\.84\(± 0\.06\)5\.87\(± 0\.38\)2\.00\(± 0\.05\)5\.88\(± 0\.49\)1\.98\(± 0\.05\)6\.17\(± 0\.55\)1\.95\(± 0\.09\)\-w/o ST\-FIG Module5\.89\(± 0\.60\)1\.94\(± 0\.07\)6\.19\(± 0\.42\)2\.05\(± 0\.07\)6\.22\(± 0\.50\)2\.02\(± 0\.08\)6\.31\(± 0\.40\)2\.00\(± 0\.06\)\-w/o Causal Module5\.90\(± 0\.44\)1\.92\(± 0\.04\)6\.13\(± 0\.57\)2\.05\(± 0\.07\)6\.23\(± 0\.50\)2\.03\(± 0\.07\)6\.31\(± 0\.28\)1\.97\(± 0\.04\)\-w/o Propensity Score5\.48\(± 0\.49\)1\.88\(± 0\.04\)6\.02\(± 0\.31\)2\.04\(± 0\.05\)5\.98\(± 0\.20\)1\.99\(± 0\.04\)6\.21\(± 0\.42\)1\.96\(± 0\.05\)

#### 5\.1\.2\.Baselines

We compare CausalPOI with eleven strong baselines, including four time\-series GNN\-based models\(Liet al\.,[2017](https://arxiv.org/html/2606.05413#bib.bib6); Wuet al\.,[2019](https://arxiv.org/html/2606.05413#bib.bib7); Baiet al\.,[2020](https://arxiv.org/html/2606.05413#bib.bib8); Zhanget al\.,[2025](https://arxiv.org/html/2606.05413#bib.bib30)\), one node\-update\-based model\(Chenet al\.,[2021](https://arxiv.org/html/2606.05413#bib.bib33)\), two statistical\-based models\(Naumziket al\.,[2020](https://arxiv.org/html/2606.05413#bib.bib19); Tschernutter and Feuerriegel,[2021](https://arxiv.org/html/2606.05413#bib.bib9)\), two large language model \(LLM\)\-based approaches\(Jinet al\.,[2023](https://arxiv.org/html/2606.05413#bib.bib20); Liuet al\.,[2025](https://arxiv.org/html/2606.05413#bib.bib31)\), one causal\-based model\(Zhaoet al\.,[2023](https://arxiv.org/html/2606.05413#bib.bib35)\)and one generative synthesis\-based model\(Wanget al\.,[2026](https://arxiv.org/html/2606.05413#bib.bib34)\)\. Details of the baselines can be found in Appendix[B](https://arxiv.org/html/2606.05413#A2)\.

Our task focuses on the cold\-start scenario, where a newly added POI has no historical check\-in records available at prediction time\. This poses challenges for baseline models that rely on temporal sequences\. To enable a practical and model\-compatible comparison, we tailor the input configurations according to the intrinsic requirements of each model type\. For statistical and LLM\-based baselines, which do not rely on temporal behavioural signals, we use only static semantic features\. For GNN\-based baselines, which require temporal input sequences, we construct a proxy temporal input for each cold\-start POI by aggregating check\-in records from its spatial neighbours within a predefined radiusm​a​xd​i​s​tmax\_\{dist\}\. Specifically, we first compute a neighbourhood\-level activity signal by averaging the historical check\-ins of neighbouring POIs over several preceding time windows\. Since cold\-start POIs typically exhibit very low initial activity, directly assigning the neighbour\-averaged signal would substantially overestimate their early\-stage check\-in volumes\. To address this issue, we introduce a cold\-start attenuation factorω∈\(0,1\)\\omega\\in\(0,1\)and scale the neighbourhood\-level activity signal accordingly\. The attenuation factorω\\omegais determined using training data only and is defined as a single global scalar, rather than a category\-specific or region\-specific coefficient\. Specifically, we first examine cold\-start POIs in the training set and compute their average check\-in volume over the initial time windows after introduction, which characterizes typical early\-stage cold\-start activity\. In parallel, we compute the corresponding neighbourhood\-level activity signal by averaging the check\-in volumes of their spatial neighbours over the same time windows\. The attenuation factorω\\omegais then calibrated to align the magnitude of the neighbourhood\-level activity signal with the average early\-stage cold\-start activity\. This training\-only calibration prevents information leakage from validation or test data while providing a fair proxy temporal input for baselines that require historical sequences\. The scaled sequence is used as the proxy temporal input for the target cold\-start POI in GNN\-based baselines, while neighbouring nodes use their actual historical check\-in sequences or temporally aggregated versions when required by the baseline implementation\. A spatial subgraph containing the target POI and its neighbours is then fed into the GNN encoder, and the output representation of the target POI node is used for prediction\.

#### 5\.1\.3\.Experimental Settings

The model is trained until convergence with a learning rate of 3e\-4, a batch size of 32\. We utilize thebert\-base\-uncasedmodel with a hidden size of 768 to encode textual information\. All the experiments involving deep learning frameworks are executed on a V100\-SXM2 GPU\.

### 5\.2\.Performance Analysis

As shown in Table[2](https://arxiv.org/html/2606.05413#S5.T2), the experimental results demonstrate clear and consistent performance differences among the baseline methods, highlighting their distinct capabilities in handling the cold\-start POI check\-in forecasting task\.

Time\-series GNN\-based models exhibit the weakest performance across all regions\. Although pseudo temporal sequences can be constructed from neighboring POIs under cold\-start settings, such approximations introduce substantial noise and fail to capture true behavioral dynamics\. Moreover, these models fundamentally rely on historical activity signals and lack mechanisms to incorporate semantic or functional attributes, making it difficult to distinguish functionally different POIs with similar spatial neighborhoods\. Statistical\-based approaches achieve moderate improvements by incorporating static spatial features\. SVGP models spatial influence through distance\-based kernels, while LCFM captures latent flows among POIs, highlighting the importance of inter\-POI dependencies\. However, both methods rely on coarse spatial representations and do not explicitly encode local graph structures, limiting their expressiveness in fine\-grained spatial contexts\. LLM\-based methods substantially outperform the above baselines by leveraging rich textual descriptions to learn discriminative semantic representations, enabling strong generalization under cold\-start conditions\. Nevertheless, they primarily operate at the semantic level and do not explicitly model spatial proximity or neighborhood interactions\. Among the additional closely related baselines, TrafficStream performs better than time\-series GNN models, suggesting that modeling evolving graph structures is helpful under cold\-start settings\. GCIM and KGDiff further achieve stronger results, indicating the potential benefits of causal spatio\-temporal modeling and generative synthesis for unseen nodes\. However, these methods still underperform CausalPOI because they are not specifically designed for localized POI\-level intervention\-aware forecasting or structure\-aligned treatment\-control comparison\. In particular, GCIM mainly focuses on causal representation learning over spatio\-temporal structures, while KGDiff emphasizes generating plausible sequences for unseen nodes\.

In contrast, CausalPOI consistently achieves the best performance across all regions and metrics\. By constructing a local spatial graph centered on the target POI and integrating semantic attributes within a unified causal framework, CausalPOI jointly captures spatial structure and functional semantics, leading to significant performance gains over all competing methods\.

### 5\.3\.Ablation Study

To evaluate the effectiveness of key components in the CausalPOI framework, we conduct ablation experiments by selectively removing the ST\-FIG and the Causal Model\.

Table[2](https://arxiv.org/html/2606.05413#S5.T2)reports the ablation results of the proposed CausalPOI framework across four regions\. Overall, removing either the ST\-FIG module or the causal inference module leads to consistent performance degradation in terms of both RMSE and MAE, indicating their respective contributions to the model’s overall effectiveness\. Removing the ST\-FIG module results in noticeable performance drops across all regions\. Similar degradations are observed in the Midwest, South, and West datasets\. This confirms that explicitly modeling functional interactions among neighbouring POIs is crucial\. By incorporating semantics\-aware interaction strengths into the graph structure, ST\-FIG enables the GNN encoder to distinguish functionally relevant neighbours from purely spatially proximate ones, leading to more informative local representations for cold\-start forecasting\. Removing the causal inference module also consistently degrades performance\. This demonstrates that purely predictive modeling is insufficient for estimating the incremental impact of newly introduced POIs under interference\. By constructing aligned treatment and control graphs and explicitly modeling counterfactual outcomes, the causal module allows CausalPOI to disentangle genuine treatment effects from correlated neighbourhood dynamics\. In addition, we analyze the effect of the propensity loss by settingγ=0\\gamma=0, which corresponds to removing the propensity term from the training objective\. As shown in Table[2](https://arxiv.org/html/2606.05413#S5.T2), removing the propensity loss leads to a small but consistent performance degradation across all four regions\. Although the magnitude of degradation is smaller compared to removing the ST\-FIG or the causal inference module, the results indicate that the propensity loss provides a stabilizing effect during training\. Specifically, it acts as an auxiliary regularizer that encourages balanced representations between treatment and control branches, leading to slightly improved forecasting performance\.

![Refer to caption](https://arxiv.org/html/2606.05413v1/figures/output.png)Figure 2\.Parameter sensitivity analysis of CausalPOIs\.
### 5\.4\.Parameters Analysis

We evaluate the sensitivity of CausalPOI to three key hyperparameters: the causal loss weightγ\\gamma, the spatial decay bandwidthσ\\sigma, and the maximum neighbourhood distancemaxd​i​s​t\\max\_\{dist\}, which respectively control the strength of causal regularization, the attenuation rate of spatial influence in graph construction, and the spatial extent of local neighbourhood graphs\. As shown in Figure[2](https://arxiv.org/html/2606.05413#S5.F2), the selected parameter configuration achieves the best overall performance across all regions\. A too\-small value ofγ\\gammaweakens the effect of treatment prediction, while an overly large one leads to over\-regularization\. An overly smallσ\\sigmacause spatial influence to decay too rapidly, restricting effective information propagation within local neighbourhoods and reducing the benefit of graph\-based modelling, whereas an excessively largeσ\\sigmaover\-smooth spatial influence across distant POIs, diluting locality\-aware signals and introducing irrelevant spatial dependencies\. Similarly, a too\-smallm​a​xd​i​s​tmax\_\{dist\}restricts the spatial coverage of local graphs, while a too\-large value introduces noisy and less relevant neighbours\.

![Refer to caption](https://arxiv.org/html/2606.05413v1/figures/box.png)Figure 3\.Sanity check of estimated uplift of CausalPOI\.
### 5\.5\.Uplift Sanity Check

Beyond predictive accuracy, we assess whether the estimated uplift exhibits reasonable behavior under different urban contexts\. Since ground\-truth causal effects are unavailable, we conduct sanity checks to evaluate the plausibility of the estimated counterfactual outcomes\. We perform a group\-level analysis by comparing the distributions of relative uplift across cold\-start POIs under different levels of neighborhood competition\. For each POI, we quantify competition by computing the average functional similarity between the POI and its surrounding neighboring POIs, based on semantic embeddings\. POIs in the test set of our four datasets are then ranked by this competition score and partitioned into high\- and low\-competition groups using the top and bottom 20% quantiles, respectively\. As shown in Figure[3](https://arxiv.org/html/2606.05413#S5.F3), POIs in highly competitive environments exhibit consistently smaller relative uplift than those in less competitive settings, which is directionally consistent with urban economic intuition\.

Table 3\.Placebo test results of CausalPOI\.RegionReal Int\.Placebo Int\.Shifted Int\.Northeast0\.2280\.0130\.171Midwest0\.2410\.0150\.179South0\.2570\.0180\.174West0\.2490\.0160\.173

### 5\.6\.Placebo Tests

To further examine whether the estimated uplift is intervention\-specific, we conduct placebo tests under pseudo and shifted intervention settings\. As shown in Table[3](https://arxiv.org/html/2606.05413#S5.T3), the real intervention setting yields consistently larger uplift values across all regions, ranging from 0\.228 to 0\.257\. In contrast, randomly assigning pseudo\-introduction events to non\-new POIs leads to near\-zero uplift values, ranging from 0\.013 to 0\.018\. This indicates that the model does not assign large treatment effects to arbitrary POIs\. When the true introduction week is shifted by approximately four weeks, the estimated uplift also decreases from 0\.057 to 0\.083 compared with the real intervention setting\. These results suggest that the learned uplift is tied to the actual POI introduction event and its timing\.

## 6\.Conclusions

This paper presents a novel problem – cold\-start POI check\-in forecasting, which aims to predict the future check\-in pattern of a newly introduced POI, by modeling its temporal evolution and functional interactions with nearby POIs in a structured urban spatial context\. To address this challenge, we propose CausalPOI, a spatio\-temporal causal representation learning framework that constructs structurally aligned treatment and control graphs to simulate factual and counterfactual outcomes\. By introducing the Functional Interaction Graph and leveraging contrastive pretraining, our approach effectively captures both spatial proximity and latent functional semantics among POIs\. This design enables precise individual treatment effect estimation and improves predictive performance in POI\-level forecasting tasks\. Through extensive experiments on real\-world datasets, CausalPOI demonstrates strong performance in modeling fine\-grained POI dynamics and offers actionable insights for urban planning and intervention analysis\.

###### Acknowledgements\.

This research is supported by the National Research Foundation, Singapore, under its Frontier CRP Grant \(NRF\-F\-CRP\-2024\-0005\), and under its AI Singapore Programme \(AISG Award No: AISG3\-RP\-2024\-034\)\. This research is also supported by NTU SUG\-NAP\. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author\(s\) and do not reflect the views of the National Research Foundation, Singapore\. This work is also supported by the China Agricultural University ”Young Researcher” Start\-up Fund No\. QNYJY2024144 and the Visiting Scholar Program of the China Scholarship Council \(CSC\) No\. 202506350123\.

## References

- L\. Bai, L\. Yao, C\. Li, X\. Wang, and C\. Wang \(2020\)Adaptive graph convolutional recurrent network for traffic forecasting\.Advances in neural information processing systems33,pp\. 17804–17815\.Cited by:[3rd item](https://arxiv.org/html/2606.05413#A2.I1.i3.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- S\. Brody, U\. Alon, and E\. Yahav \(2021\)How attentive are graph attention networks?\.arXiv preprint arXiv:2105\.14491\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p3.1),[§4\.2\.3](https://arxiv.org/html/2606.05413#S4.SS2.SSS3.p2.12)\.
- X\. Chen, J\. Wang, and K\. Xie \(2021\)TrafficStream: a streaming traffic flow forecasting framework based on graph neural networks and continual learning\.arXiv preprint arXiv:2106\.06273\.Cited by:[5th item](https://arxiv.org/html/2606.05413#A2.I1.i5.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2018\)Bert: pre\-training of deep bidirectional transformers for language understanding\.arXiv preprint arXiv:1810\.04805\.Cited by:[§4\.1](https://arxiv.org/html/2606.05413#S4.SS1.p3.3),[§4\.1](https://arxiv.org/html/2606.05413#S4.SS1.p4.8)\.
- Y\. Fu, P\. Wang, J\. Du, L\. Wu, and X\. Li \(2019\)Efficient region embedding with multi\-view spatial networks: a perspective of locality\-constrained spatial autocorrelations\.InProceedings of the AAAI conference on artificial intelligence,Vol\.33,pp\. 906–913\.Cited by:[§2\.1](https://arxiv.org/html/2606.05413#S2.SS1.p1.1)\.
- S\. Guo, Y\. Lin, N\. Feng, C\. Song, and H\. Wan \(2019\)Attention based spatial\-temporal graph convolutional networks for traffic flow forecasting\.InProceedings of the AAAI conference on artificial intelligence,Vol\.33,pp\. 922–929\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p3.1)\.
- A\. Hajisafi, H\. Lin, S\. Shaham, H\. Hu, M\. D\. Siampou, Y\. Chiang, and C\. Shahabi \(2023\)Learning dynamic graphs from all contextual information for accurate point\-of\-interest visit forecasting\.InProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems,pp\. 1–12\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p2.1)\.
- M\. Jin, S\. Wang, L\. Ma, Z\. Chu, J\. Y\. Zhang, X\. Shi, P\. Chen, Y\. Liang, Y\. Li, S\. Pan,et al\.\(2023\)Time\-llm: time series forecasting by reprogramming large language models\.arXiv preprint arXiv:2310\.01728\.Cited by:[8th item](https://arxiv.org/html/2606.05413#A2.I1.i8.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- S\. Li, J\. Zhou, T\. Xu, H\. Liu, X\. Lu, and H\. Xiong \(2020\)Competitive analysis for points of interest\.InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining,pp\. 1265–1274\.Cited by:[§2\.1](https://arxiv.org/html/2606.05413#S2.SS1.p1.1)\.
- Y\. Li, R\. Yu, C\. Shahabi, and Y\. Liu \(2017\)Diffusion convolutional recurrent neural network: data\-driven traffic forecasting\.arXiv preprint arXiv:1707\.01926\.Cited by:[1st item](https://arxiv.org/html/2606.05413#A2.I1.i1.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- Z\. Li, S\. Hsu, and C\. Shahabi \(2024\)Forecasting unseen points of interest visits using context and proximity priors\.In2024 IEEE International Conference on Big Data \(BigData\),pp\. 5812–5818\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p2.1)\.
- C\. Liu, Q\. Xu, H\. Miao, S\. Yang, L\. Zhang, C\. Long, Z\. Li, and R\. Zhao \(2025\)Timecma: towards llm\-empowered multivariate time series forecasting via cross\-modality alignment\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 18780–18788\.Cited by:[9th item](https://arxiv.org/html/2606.05413#A2.I1.i9.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- J\. Liu, T\. Li, S\. Ji, P\. Xie, S\. Du, F\. Teng, and J\. Zhang \(2021\)Urban flow pattern mining based on multi\-source heterogeneous data fusion and knowledge graph embedding\.IEEE Transactions on Knowledge and Data Engineering35\(2\),pp\. 2133–2146\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p1.1)\.
- R\. Liu, C\. Yin, and P\. Zhang \(2020\)Estimating individual treatment effects with time\-varying confounders\.In2020 IEEE international conference on data mining \(ICDM\),pp\. 382–391\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p3.1)\.
- J\. Ma, M\. Wan, L\. Yang, J\. Li, B\. Hecht, and J\. Teevan \(2022\)Learning causal effects on hypergraphs\.InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 1202–1212\.Cited by:[§2\.2](https://arxiv.org/html/2606.05413#S2.SS2.p2.1)\.
- C\. Naumzik, P\. Zoechbauer, and S\. Feuerriegel \(2020\)Mining points\-of\-interest for explaining urban phenomena: a scalable variational inference approach\.InProceedings of The Web Conference 2020,pp\. 2342–2353\.Cited by:[6th item](https://arxiv.org/html/2606.05413#A2.I1.i6.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- A\. v\. d\. Oord, Y\. Li, and O\. Vinyals \(2018\)Representation learning with contrastive predictive coding\.arXiv preprint arXiv:1807\.03748\.Cited by:[§4\.1](https://arxiv.org/html/2606.05413#S4.SS1.p4.8)\.
- U\. Shalit, F\. D\. Johansson, and D\. Sontag \(2017\)Estimating individual treatment effect: generalization bounds and algorithms\.InInternational conference on machine learning,pp\. 3076–3085\.Cited by:[§2\.2](https://arxiv.org/html/2606.05413#S2.SS2.p1.1)\.
- C\. Shi, D\. Blei, and V\. Veitch \(2019\)Adapting neural networks for the estimation of treatment effects\.Advances in neural information processing systems32\.Cited by:[§2\.2](https://arxiv.org/html/2606.05413#S2.SS2.p1.1)\.
- D\. Tschernutter and S\. Feuerriegel \(2021\)A latent customer flow model for interpretable predictions of check\-in counts\.In2021 IEEE International Conference on Big Data \(Big Data\),pp\. 529–539\.Cited by:[7th item](https://arxiv.org/html/2606.05413#A2.I1.i7.p1.1),[§2\.1](https://arxiv.org/html/2606.05413#S2.SS1.p2.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- L\. Wang, A\. Adiga, J\. Chen, A\. Sadilek, S\. Venkatramanan, and M\. Marathe \(2022\)Causalgnn: causal\-based graph neural networks for spatio\-temporal epidemic forecasting\.InProceedings of the AAAI conference on artificial intelligence,Vol\.36,pp\. 12191–12199\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.05413#S2.SS2.p2.1)\.
- Z\. Wang, L\. Chen, Y\. Jin, P\. Deng, S\. Pang, J\. Liu, and Y\. Zhao \(2026\)Knowledge graph guided heterogeneity\-informed diffusion model for spatio\-temporal generation\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.40,pp\. 15915–15923\.Cited by:[11st item](https://arxiv.org/html/2606.05413#A2.I1.i11.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- S\. Wu, X\. Yan, X\. Fan, S\. Pan, S\. Zhu, C\. Zheng, M\. Cheng, and C\. Wang \(2022\)Multi\-graph fusion networks for urban region embedding\.arXiv preprint arXiv:2201\.09760\.Cited by:[§2\.1](https://arxiv.org/html/2606.05413#S2.SS1.p1.1)\.
- Z\. Wu, S\. Pan, G\. Long, J\. Jiang, X\. Chang, and C\. Zhang \(2020\)Connecting the dots: multivariate time series forecasting with graph neural networks\.InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining,pp\. 753–763\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p3.1)\.
- Z\. Wu, S\. Pan, G\. Long, J\. Jiang, and C\. Zhang \(2019\)Graph wavenet for deep spatial\-temporal graph modeling\.arXiv preprint arXiv:1906\.00121\.Cited by:[2nd item](https://arxiv.org/html/2606.05413#A2.I1.i2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- G\. Yang, Y\. Cai, and C\. K\. Reddy \(2018\)Recurrent spatio\-temporal point process for check\-in time prediction\.InProceedings of the 27th ACM International Conference on Information and Knowledge Management,pp\. 2203–2211\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p2.1)\.
- B\. Yu, H\. Yin, and Z\. Zhu \(2017\)Spatio\-temporal graph convolutional networks: a deep learning framework for traffic forecasting\.arXiv preprint arXiv:1709\.04875\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p3.1)\.
- S\. Yuan, D\. Li, W\. Liu, X\. Zhang, M\. Chen, J\. Zhang, and Y\. Gong \(2024\)Fine\-grained urban flow inference with multi\-scale representation learning\.arXiv preprint arXiv:2406\.09710\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p1.1)\.
- Q\. Zhang, X\. Gao, H\. Wang, S\. M\. Yiu, and H\. Yin \(2025\)Efficient traffic prediction through spatio\-temporal distillation\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 1093–1101\.Cited by:[4th item](https://arxiv.org/html/2606.05413#A2.I1.i4.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.
- Y\. Zhang, Y\. Xu, L\. Cui, and Z\. Yan \(2023\)Multi\-view graph contrastive learning for urban region representation\.In2023 International Joint Conference on Neural Networks \(IJCNN\),pp\. 1–8\.Cited by:[§2\.1](https://arxiv.org/html/2606.05413#S2.SS1.p1.1)\.
- Y\. Zhang, Y\. Fu, P\. Wang, X\. Li, and Y\. Zheng \(2019\)Unifying inter\-region autocorrelation and intra\-region structures for spatial embedding via collective adversarial learning\.InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 1700–1708\.Cited by:[§2\.1](https://arxiv.org/html/2606.05413#S2.SS1.p1.1)\.
- Z\. Zhang, P\. Balsebre, S\. Luo, Z\. Hai, and J\. Huang \(2024\)StructAM: enhancing address matching through semantic understanding of structure\-aware information\.InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation \(LREC\-COLING 2024\),pp\. 15350–15361\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p1.1)\.
- Z\. Zhang, M\. Xie, P\. Balsebre, W\. Huang, S\. Luo, and G\. Cong \(2026\)UrbanMFM: spatial graph\-based multiscale foundation models for learning generalized urban representation\.IEEE Transactions on Knowledge and Data Engineering\.Cited by:[§1](https://arxiv.org/html/2606.05413#S1.p3.1)\.
- Y\. Zhao, P\. Deng, J\. Liu, X\. Jia, and J\. Zhang \(2023\)Generative causal interpretation model for spatio\-temporal representation learning\.InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 3537–3548\.Cited by:[10th item](https://arxiv.org/html/2606.05413#A2.I1.i10.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.05413#S5.SS1.SSS2.p1.1)\.

## Appendix AData Example

We hereby provide data examples of POI and check\-in sequence in Table[4](https://arxiv.org/html/2606.05413#A1.T4)\.

Table 4\.Examples of POI and POI check\-in sequenceTypeExamplePOIid : sg:002a9\.\.\.bc48e,category : Grocery Stores,lat : 44\.556128,lon : \-123\.066371,tags : \{name : Jacksons Food Stores,street : 33157 Highway 34 SE,city : Albany,state : OR,postcode : 97322\}check\-in seqid : sg:1e57c\.\.\.2318f,category : Grocery Stores,lat : 18\.465922,lon : \-66\.10359,sequence : \[2018\-09\-08 to 2018\-09\-14: 2,2018\-09\-15 to 2018\-09\-21: 1,\.\.\.2020\-01\-25 to 2020\-01\-31: 2\]
## Appendix BBaselines

- •DCRNN\(Liet al\.,[2017](https://arxiv.org/html/2606.05413#bib.bib6)\)models spatiotemporal dependencies using bidirectional diffusion convolution on directed graphs and GRU\-based sequence\-to\-sequence learning, with scheduled sampling to stabilize multi\-step forecasting\.
- •GraphWaveNet\(Wuet al\.,[2019](https://arxiv.org/html/2606.05413#bib.bib7)\)combines dilated causal convolutions with adaptive graph convolution, learning a self\-adjusting adjacency matrix to model latent spatial dependencies and capture long\-range temporal patterns\.
- •AGCRN\(Baiet al\.,[2020](https://arxiv.org/html/2606.05413#bib.bib8)\)introduces node\-specific adaptive adjacency matrices and gated recurrent units, enabling flexible modeling of heterogeneous spatial\-temporal correlations without fixed graph structures\.
- •LightST\(Zhanget al\.,[2025](https://arxiv.org/html/2606.05413#bib.bib30)\)is a lightweight spatio\-temporal graph neural network that improves forecasting efficiency via spatio\-temporal distillation while maintaining competitive accuracy\.
- •TrafficStream\(Chenet al\.,[2021](https://arxiv.org/html/2606.05413#bib.bib33)\)focuses on spatio\-temporal prediction under node updates, using graph neural networks and continual learning to handle streaming traffic data with dynamically evolving nodes\.
- •SVGP\(Naumziket al\.,[2020](https://arxiv.org/html/2606.05413#bib.bib19)\)uses a sparse Gaussian process to model urban phenomena based on POI distributions, capturing spatial effects through distance\-based kernels with interpretable outputs\.
- •LCFM\(Tschernutter and Feuerriegel,[2021](https://arxiv.org/html/2606.05413#bib.bib9)\)Explains POI check\-ins through latent customer flows, including direct visits, competition, and transitions, using a fixed\-point formulation and Gaussian processes for interpretable spatial modelling\.
- •TIME\-LLM\(Jinet al\.,[2023](https://arxiv.org/html/2606.05413#bib.bib20)\)reprograms LLMs for time series forecasting via prompts, achieving strong performance without fine\-tuning model weights\.
- •TimeCMA\(Liuet al\.,[2025](https://arxiv.org/html/2606.05413#bib.bib31)\)is a recent LLM\-based time series forecasting method that enhances prediction by modeling cross\-variable temporal dependencies through contextual representations\.
- •GCIM\(Zhaoet al\.,[2023](https://arxiv.org/html/2606.05413#bib.bib35)\)belongs to causal spatio\-temporal intervention modeling, aiming to learn interpretable spatio\-temporal representations through generative causal mechanisms\.
- •KGDiff\(Wanget al\.,[2026](https://arxiv.org/html/2606.05413#bib.bib34)\)is a generative synthesis\-based method that leverages knowledge graphs and diffusion modeling to capture heterogeneity and generate spatio\-temporal sequences\.

Similar Articles

Spatiotemporal Imputation with Graph-Informed Flow Matching

arXiv cs.LG

GiFlow is a graph-informed flow matching framework for spatiotemporal imputation that replaces Gaussian priors with a graph-informed prior, and uses a hybrid vector field model combining spatial attention, temporal attention, and spatiotemporal propagation. It outperforms state-of-the-art methods on synthetic and real-world datasets.

A Global-Local Graph Attention Network for Traffic Forecasting

arXiv cs.AI

Proposes a Global-Local Graph Attention Network (GLGAT) with pairwise encoding and event-based adjacency matrix for traffic forecasting, effectively capturing spatio-temporal correlations and achieving competitive performance on real-world datasets.

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

arXiv cs.LG

Proposes GC-MoE, a graph-conditioned mixture of experts framework for traffic forecasting that assigns each node a personalized combination of frozen pretrained spatio-temporal GNN experts based on graph topology and recent input, training only a lightweight routing module (∼17K parameters) and achieving competitive performance on four benchmarks.