Think Before You Act: Intention-Guided Reasoning for LLM-Based Location Prediction

arXiv cs.AI Papers

Summary

IntentPOI is a two-stage intention-guided reasoning framework for next POI prediction that first infers user intentions from historical mobility, peer behavior, and temporal context, then selects locations aligned with those intentions, outperforming eleven state-of-the-art baselines.

arXiv:2606.08122v1 Announce Type: new Abstract: Predicting a user's next Point-of-Interest (POI) based on their historical check-in records is a fundamental task in location-based services. While recent methods incorporating large language models have shown strong reasoning capabilities and promising results, they typically formulate the prediction task as a one-step trajectory-to-location mapping problem, making predictions prone to shallow trajectory correlations and historical frequency bias. We argue that users rarely choose locations directly and instead, they usually first form a traveling intention and then accordingly select specific POIs. Motivated by this insight, we propose IntentPOI, a two-stage intention-guided reasoning framework. In the thinking stage, we infer users' intermediate intentions by incorporating historical mobility patterns, similar peer behaviors, and the temporal contexts. In the acting stage, we first construct a compact candidate pool, and then perform intention-guided reasoning to identify locations that best align with the inferred intention. By explicitly decoupling intention inference from location prediction, IntentPOI transforms the next POI prediction from direct trajectory matching into intention-guided reasoning. Extensive experiments on three real-world datasets demonstrate that IntentPOI consistently outperforms eleven state-of-the-art baselines.
Original Article
View Cached Full Text

Cached at: 06/09/26, 08:54 AM

# Think Before You Act: Intention-Guided Reasoning for LLM-Based Location Prediction
Source: [https://arxiv.org/html/2606.08122](https://arxiv.org/html/2606.08122)
,Anqi LiangShanghai Jiao Tong UniversityChina,Zhuoyang JiangThe Hong Kong University of Science and Technology \(Guangzhou\)China,Yutian JiangThe Hong Kong University of Science and Technology \(Guangzhou\)China,Sisuo LyuThe Hong Kong University of Science and TechnologyHong Kong,Yu JiFudan UniversityChina,Haomin WenShanghai Innovation InstituteChinaandYuxuan LiangThe Hong Kong University of Science and Technology \(Guangzhou\)China

\(5 June 2009\)

###### Abstract\.

Predicting a user’s next Point\-of\-Interest \(POI\) based on their historical check\-in records is a fundamental task in location\-based services\. While recent methods incorporating large language models have shown strong reasoning capabilities and promising results, they typically formulate the prediction task as a one\-step trajectory\-to\-location mapping problem, making predictions prone to shallow trajectory correlations and historical frequency bias\. We argue that users rarely choose locations directly and instead, they usually first form a traveling intention and then accordingly select specific POIs\. Motivated by this insight, we proposeIntentPOI, a two\-stage intention\-guided reasoning framework\. In thethinkingstage, we infer users’ intermediate intentions by incorporating historical mobility patterns, similar peer behaviors, and the temporal contexts\. In theactingstage, we first construct a compact candidate pool, and then perform intention\-guided reasoning to identify locations that best align with the inferred intention\. By explicitly decoupling intention inference from location prediction, IntentPOI transforms the next POI prediction from direct trajectory matching into intention\-guided reasoning\. Extensive experiments on three real\-world datasets demonstrate that IntentPOI consistently outperforms eleven state\-of\-the\-art baselines\.

Next Location Prediction, Large Language Model, Spatio\-Temporal Modeling

††copyright:acmlicensed††journalyear:2018††doi:XXXXXXX\.XXXXXXX††conference:Make sure to enter the correct conference title from your rights confirmation email; June 03–05, 2018; Woodstock, NY††isbn:978\-1\-4503\-XXXX\-X/2018/06††ccs:Information systems Location based services††ccs:Computing methodologies Knowledge representation and reasoning## 1\.Introduction

The rapid development of urban computing and location\-based services has promoted Point\-of\-Interest\- \(POI\) aware applications in diverse real\-world scenarios, including route planning, targeted advertising, and trajectory prediction\(Lucaet al\.,[2021](https://arxiv.org/html/2606.08122#bib.bib54); Chenet al\.,[2025a](https://arxiv.org/html/2606.08122#bib.bib55)\)\. As a fundamental task, next POI prediction aims to recommend the location a user is likely to visit at a given time point by analyzing mobility patterns from historical check\-in trajectories\(Zhaoet al\.,[2020](https://arxiv.org/html/2606.08122#bib.bib56); Laiet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib57)\)\. The deep learning\-based next POI prediction methods predominantly model sequential check\-in records through recurrent networks\(Zhaoet al\.,[2020](https://arxiv.org/html/2606.08122#bib.bib56); Fenget al\.,[2018b](https://arxiv.org/html/2606.08122#bib.bib64); Wuet al\.,[2020](https://arxiv.org/html/2606.08122#bib.bib65)\), attention mechanisms\(Luoet al\.,[2021](https://arxiv.org/html/2606.08122#bib.bib66); Sunet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib67); Yanget al\.,[2022b](https://arxiv.org/html/2606.08122#bib.bib68)\), or graph neural networks\(Limet al\.,[2020a](https://arxiv.org/html/2606.08122#bib.bib69); Sánchez and Bellogín,[2022](https://arxiv.org/html/2606.08122#bib.bib15)\)to capture spatial and temporal transition patterns\(Danget al\.,[2023](https://arxiv.org/html/2606.08122#bib.bib58); Yinet al\.,[2023](https://arxiv.org/html/2606.08122#bib.bib59); Zenget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib60); Wuet al\.,[2025b](https://arxiv.org/html/2606.08122#bib.bib61)\)\. While achieving promising performance, these methods heavily rely on implicit pattern matching over historical trajectories and lack the underlying analysis, which limits their generalization and interpretability in complex urban environments\(Wang and Wang,[2024](https://arxiv.org/html/2606.08122#bib.bib74); Yanget al\.,[2024b](https://arxiv.org/html/2606.08122#bib.bib75)\)\.

Inspired by the prominent reasoning capabilities of Large Language Models \(LLMs\), recent researchers have applied LLMs to next POI prediction\(Wuet al\.,[2025a](https://arxiv.org/html/2606.08122#bib.bib62); Zhonget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib26); Liet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib23); Fenget al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib22); Chenet al\.,[2025b](https://arxiv.org/html/2606.08122#bib.bib27); Lvet al\.,[2026](https://arxiv.org/html/2606.08122#bib.bib63); Tanet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib77)\), yielding two main paradigms\. Prompt\-based methods reorganize historical trajectories into textual prompts, guiding LLMs to infer user profiles and mobility patterns for evidence\-based prediction\(Zhonget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib26); Liet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib23); Fenget al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib22)\)\. For example, CoMaPOI prompts LLMs with historical trajectories to derive long\-term user profiles, short\-term mobility patterns, and candidate POIs for final re\-ranking\. Token\-based methods pretrain semantic tokens for POIs, allowing LLMs to directly learn POI\-level knowledge in textual space\(Chenet al\.,[2025b](https://arxiv.org/html/2606.08122#bib.bib27); Lvet al\.,[2026](https://arxiv.org/html/2606.08122#bib.bib63)\)\. For instance, QT\-Mob encodes each POI’s overview, geographic location, and spatial context into four discrete tokens and trains the LLM to map these combined tokens to the exact POI index\. These two types of methods demonstrate that LLMs can effectively capture mobility patterns and achieve notable improvements over deep learning\-based approaches\.

Despite their promising performance, existing LLM\-based methods tend to treat the next POI prediction as atrajectory\-to\-location mapping problem, where the adopted LLMs directly predict the targeted POI or re\-rank candidates based on the given historical check\-in records, thus prone to shallow trajectory correlations and historical frequency bias\. This limitation becomes particularly severe when historical trajectories are sparse or multiple candidates exhibit similar transition patterns\. In such cases, the LLM often defaults to frequently visited locations, even though these predictions are semantically inconsistent with the current context\. In fact,users rarely make mobility decisions directly at the POI level\. Instead, they typically formexplicit traveling intentions\(such as dining, shopping, or socializing\) and then choose specific locations that satisfy their intentions under spatial and temporal constraints\. This insight suggests that the next POI prediction should not be formulated as a single\-step forward problem but as a two\-stage reasoning process thatfirst infers the user’s intention and then selects intention\-aligned locations\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x1.png)Figure 1\.Both the prompt\- and token\-based methods perform one\-step prediction for the next POI index or tokens\. Our proposedIntentPOIexplicitly infers the user’s intention as an intermediate step, which serves as a reasoning scaffold to guide the downstream LLM toward intention\-aligned reasoning\.To address this gap, we propose athinking\-before\-actingprinciple for LLM\-based next POI prediction\. Instead of directly mapping historical trajectories to locations, the next POI prediction problem is decomposed into two reasoning stages\. Thethinkingstage focuses on understanding the reason*why*the user is likely to travel by explicitly inferring latent intention from historical mobility patterns and contextual signals\. Theactingstage focuses on determining*where*the user will go by recommending locations that best satisfy the inferred intention\. The explicit inferred intention serves as a reasoning scaffold that bridges mobility understanding and downstream POI prediction\.

Building upon this principle, we proposeIntentPOI, a two\-stage intention\-guided LLM reasoning framework for next POI prediction\. In thethinkingstage, IntentPOI integrates multi\-rationale evidence, including user profiles, peer behaviors, and temporal contexts, to infer the user’s latent intention through LLM reasoning\. In theactingstage, IntentPOI first constructs a compact candidate pool by combining historically visited POIs with spatially proximate POIs, and then performs intention\-guided reasoning to identify locations that best align with the inferred intention\. Through this two\-stage process, IntentPOI transforms next POI prediction from frequency\-driven trajectory matching into intention\-grounded reasoning\.

Our contributions are summarized as follows:

- •We identify the lack of explicit intention modeling as a fundamental limitation of existing LLM\-based next POI prediction methods and reformulate next POI prediction as a two\-stage reasoning problem consisting of intention inference and location determination\.
- •We propose IntentPOI, a thinking\-before\-acting LLM reasoning framework, to infer the intentions based on multi\-rationale evidence and then perform intention\-guided candidate recommendation\.
- •Extensive experiments on three real\-world datasets demonstrate the effectiveness and efficiency of IntentPOI compared with state\-of\-the\-art baselines\.

## 2\.Related Work

Deep learning\-based approaches\.Early deep learning\-based next POI prediction methods commonly employed sequential architectures, including RNN\-based and attention\-based models, to model users’ check\-in sequences, thereby capturing their mobility patterns and inferring location preferences\(Fenget al\.,[2018a](https://arxiv.org/html/2606.08122#bib.bib16); Gaoet al\.,[2019](https://arxiv.org/html/2606.08122#bib.bib17); Yanget al\.,[2020](https://arxiv.org/html/2606.08122#bib.bib18); Xueet al\.,[2021](https://arxiv.org/html/2606.08122#bib.bib19); Jianget al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib20)\)\. For example, DeepMove\(Fenget al\.,[2018a](https://arxiv.org/html/2606.08122#bib.bib16)\)adopts an RNN\-based architecture that incorporates multiple mobility\-related factors to model human transition regularities, and introduces a historical attention mechanism to capture periodic patterns from users’ long\-term mobility histories\. Flashback\(Yanget al\.,[2020](https://arxiv.org/html/2606.08122#bib.bib18)\)mitigates trajectory sparsity by enabling RNNs to selectively revisit relevant past states\. It uses spatial and temporal signals to weight previous trajectory representations, thereby improving next POI prediction performance\. MobTCast\(Xueet al\.,[2021](https://arxiv.org/html/2606.08122#bib.bib19)\)is a context\-aware Transformer\-based model that incorporates spatio\-temporal, semantic, social, and geographic contexts for next POI prediction\. It encodes historical POI sequences and semantic information with a Transformer\-based feature extractor, and further accounts for social influence and geographic constraints\. Another line of research explores graph\-based methods for next POI prediction, which construct graphs from mobility trajectories to capture user\-POI interactions and POI transition patterns\. These methods then apply GNNs to learn relational representations for predicting users’ future POI visits\(Limet al\.,[2020b](https://arxiv.org/html/2606.08122#bib.bib28); Liet al\.,[2021](https://arxiv.org/html/2606.08122#bib.bib30); Raoet al\.,[2022](https://arxiv.org/html/2606.08122#bib.bib29); Yanet al\.,[2023](https://arxiv.org/html/2606.08122#bib.bib31); Raoet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib32)\)\. For instance, STP\-UDGAT\(Limet al\.,[2020b](https://arxiv.org/html/2606.08122#bib.bib28)\)captures POI\-POI and user\-user relationships with graph attention, combining personalized local preferences and global spatial\-temporal\-preference neighborhoods for next POI recommendation\. Graph\-Flashback\(Raoet al\.,[2022](https://arxiv.org/html/2606.08122#bib.bib29)\)constructs a heterogeneous spatio\-temporal knowledge graph to learn POI embeddings that capture transition patterns among POIs\. STHGCN\(Yanet al\.,[2023](https://arxiv.org/html/2606.08122#bib.bib31)\)leverages a spatio\-temporal hypergraph to model high\-order dependencies and global collaborative relationships across mobility trajectories\. It further integrates inter\-user and intra\-user collaborative signals with spatio\-temporal contexts for next POI prediction\. Despite their effectiveness in modeling sequential patterns and relational structures, these methods largely rely on observed mobility correlations and predefined contextual features, making it difficult to capture deeper behavioral intentions and semantic dependencies in check\-in trajectories\. This limits their generalization to diverse user behaviors and complex urban environments\.

LLM\-based approaches\.Recent studies have explored LLMs for next POI prediction due to their strong reasoning and generation capabilities\. For example, LLM\-Mob\(Wanget al\.,[2023](https://arxiv.org/html/2606.08122#bib.bib21)\)leverages the reasoning capabilities of LLMs through context\-inclusive prompts that encode users’ historical and recent mobility records, thereby capturing both long\-term and short\-term mobility dependencies\. LLM\-Move\(Fenget al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib22)\)formulates next\-POI prediction as a candidate ranking problem and introduces prompting strategies that incorporate geographic preferences, spatial distances, and sequential transition patterns\. LLM4POI\(Liet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib23)\)adapts pretrained LLMs by prompt\-based question answering, allowing the model to leverage contextual information and commonsense knowledge through fine\-tuning\. Mobility\-LLM\(Gonget al\.,[2024b](https://arxiv.org/html/2606.08122#bib.bib33)\)extracts semantic information from check\-in sequences to help LLMs better understand users’ visiting intentions and travel preferences, and further fine\-tunes the model on multiple mobility analysis tasks\. POI\-Enhancer\(Chenget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib24)\)improves POI representations by leveraging LLM\-derived semantic knowledge, while GNPR\-SID\(Wanget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib25)\)constructs semantic POI identifiers from semantic and collaborative features for generative next POI recommendation\. QT\-Mob\(Chenet al\.,[2025b](https://arxiv.org/html/2606.08122#bib.bib27)\)adapts LLMs for mobility modeling by representing mobility records in textual space and encoding location semantics as discrete tokens, thereby capturing rich contextual information while remaining compatible with LLM architectures\. SILO\(Sunet al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib34)\)constructs a hybrid semantic space that integrates ID\-based embeddings, context\-based semantics, and auxiliary contextual information, enabling the joint modeling of sequential mobility patterns and rich contextual semantics\. CoMaPOI\(Zhonget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib26)\)introduces a collaborative multi\-agent framework to address LLMs’ limited understanding of numerical spatio\-temporal data and alleviate irrelevant predictions caused by large candidate POI spaces\. However, existing LLM\-based approaches often rely on prompt engineering or textualized trajectory representations\. As a result, they may still struggle to jointly capture complex behavioral patterns, collaborative user relationships, and fine\-grained travel intentions, thereby limiting their robustness and generalization\.

## 3\.Preliminaries

In this section, we elaborate the preliminary knowledge in the next POI prediction\. Let𝒰=\{un∣1≤n≤N\}\\mathcal\{U\}=\\\{u\_\{n\}\\mid 1\\leq n\\leq N\\\}denote the set ofNNusers and𝒫=\{pm∣1≤m≤M\}\\mathcal\{P\}=\\\{p\_\{m\}\\mid 1\\leq m\\leq M\\\}denote the set ofMMPOIs\. Each POIpmp\_\{m\}has a name, a categorycc, and a geographic location\(l​a​t,l​o​n\)\(lat,lon\), withl​a​tlatandl​o​nlondenoting the latitude and longitude respectively\.

We are givenHHhistorical trajectories of the userunu\_\{n\}, denoted as𝒳n=\{Xn1,Xn2,⋯,XnH\}\\mathcal\{X\}\_\{n\}=\\\{X\_\{n\}^\{1\},X\_\{n\}^\{2\},\\cdots,X\_\{n\}^\{H\}\\\}, whereXni​\(1≤i≤H\)X\_\{n\}^\{i\}\(1\\leq i\\leq H\)denotes theii\-th historical trajectory\.XniX\_\{n\}^\{i\}consists of several chronological check\-in records, and we haveXni=\(xn,1i,xn,2i,⋯\)X\_\{n\}^\{i\}=\(x\_\{n,1\}^\{i\},x\_\{n,2\}^\{i\},\\cdots\), wherexn,jix\_\{n,j\}^\{i\}represents thejj\-th check\-in record in theii\-th historical trajectory ofunu\_\{n\}\.xn,jix\_\{n,j\}^\{i\}can be denoted as a tuple\(n,ti,j,p\)\(n,t\_\{i,j\},p\), which respectively represents the user index, the visiting time, and the specific POI\.

Given that the last check\-in record inXni​\(1≤i≤H−1\)X\_\{n\}^\{i\}\(1\\leq i\\leq H\-1\)is always earlier than the first one inXni\+1X\_\{n\}^\{i\+1\}, we can reorganize theHHhistorical trajectories into one long trajectory for simplicity:

\(1\)Xn=\(xn,11,xn,21,⋯,xn,12,xn,22,,⋯,xn,1H,xn,2H,⋯\)⇐𝒳n=\{Xn1,Xn2,⋯,XnH\}\\begin\{array\}\[\]\{c\}X\_\{n\}=\(x\_\{n,1\}^\{1\},x\_\{n,2\}^\{1\},\\cdots,x\_\{n,1\}^\{2\},x\_\{n,2\}^\{2\},,\\cdots,x\_\{n,1\}^\{H\},x\_\{n,2\}^\{H\},\\cdots\)\\\\ \\Leftarrow\\mathcal\{X\}\_\{n\}=\\\{X\_\{n\}^\{1\},X\_\{n\}^\{2\},\\cdots,X\_\{n\}^\{H\}\\\}\\end\{array\}
Given aquery trajectory111We name a new sequence of check\-in records that ends at the target time point as a query trajectory, which serves as the immediate context for the next POI prediction\.Xnq=\(xn,1q,xn,2q,⋯\)X\_\{n\}^\{\\rm q\}=\(x\_\{n,1\}^\{\\rm q\},x\_\{n,2\}^\{\\rm q\},\\cdots\)of the userunu\_\{n\}, we aim to predict the specific POI at the target time pointtnqt\_\{n\}^\{\\rm q\}, wheretnqt\_\{n\}^\{\\rm q\}is later than the visiting time point of the last known record inXnqX\_\{n\}^\{\\rm q\}\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x2.png)Figure 2\.The workflow ofIntentPOIincludes Thinking Stage: Multi\-Rationale Intention Inference and Acting Stage: Intention\-Guided Reasoning for POI Prediction\.
## 4\.Methodology

The workflow of our proposed IntentPOI is presented in Fig\.[2](https://arxiv.org/html/2606.08122#S3.F2)\. The thinking stage infers latent intentions on multi\-source rationale evidence, i\.e\., user profile, peer behaviors and temporal context\. The acting stage firstly constructs a compact candidate pool, feed it to a reasoning LLM together with the inferred intention and supporting signals, where the intention serves as the reasoning scaffold for performing candidate recommendation\.

### 4\.1\.Thinking: Multi\-Rationale Intention Inference

Existing LLM\-based methods collapse reasoning into a single implicit process and the LLM directly predicts the targeted POIs or re\-rank candidates, without explicitly reasoning about the user’s latent intention\. This heavily relies on the trajectory correlations and generates frequency\-biased predictions\. Therefore, in this section, we aim to infer the intermediate intention to guide the downstream reasoning\. Specifically, we ground the intention onmulti\-rationale evidence, includinguser profile,peer behaviors, and thetemporal contextfrom the query trajectory\.

User Profile\.The user profile can reflect the mobility patterns from the temporal and spatial perspective, providing a foundation for both intention inference and candidate recommendation\. We first extract a statistical summary from the historical trajectoryXnX\_\{n\}, capturing the frequency distributions of visiting hours, POIs, and POI categories\. LetXnsX\_\{n\}^\{\\rm s\}denote the statistical summary\. We then incorporate bothXnsX\_\{n\}^\{\\rm s\}andXnX\_\{n\}into the prompt of a pretrained LLMℳF\\mathcal\{M\}\_\{\\rm F\}to generate the profile ofunu\_\{n\}:

\(2\)PnF←ℳF​\(Xn,Xns\),P\_\{n\}^\{\\rm F\}\\leftarrow\\mathcal\{M\}\_\{\\rm F\}\(X\_\{n\},X\_\{n\}^\{\\rm s\}\),wherePnFP\_\{n\}^\{\\rm F\}includes the inference of user’s mobility pattern, e\.g\.,a strong late\-afternoon to evening rhythmandan evening\-oriented urban explorer\. To avoid hallucination,ℳF\\mathcal\{M\}\_\{\\rm F\}is prompted to provide specific evidence to support the inference, based on the temporal and spatial analysis of the historical trajectory, as shown in Fig\.[3](https://arxiv.org/html/2606.08122#S4.F3)\(left\)\. The detailed prompt ofℳF\\mathcal\{M\}\_\{\\rm F\}is presented in Appendix\.[B](https://arxiv.org/html/2606.08122#A2)\.

Peer Behaviors\.Given the insufficient contexts of sparse individual trajectories, peer behaviors can incorporate mobility evidence from similar users, thereby expanding the LLM’s reasoning horizons from a single trajectory to multi\-user evidence\. Users with similar mobility patterns tend to visit POIs of similar categories or even the same locations\. On the one hand, users active during similar temporal periods may have consistent latent intentions, such as dining or socializing, which may be invariant across geographic regions\. On the other hand, users with similar historical trajectories tend to share underlying mobility patterns, which increases the likelihood of visiting the same locations in the future\. Hence, we obtain bothsemanticandgeographic similarityto evaluate the social connection among users\.

We first obtain the semantic embedding of user’s profilePnFP\_\{n\}^\{\\rm F\}by a text embedding modelΦ\\Phi:

\(3\)En=Φ​\(PnF\)\.E\_\{n\}=\\Phi\(P\_\{n\}^\{\\rm F\}\)\.We then evaluate the semantic similarity of different users:

\(4\)sm,nE=𝐜𝐨𝐬​\(En,Em\)=En⋅Em‖En‖​‖Em‖,s\_\{m,n\}^\{\\rm E\}=\\mathbf\{cos\}\(E\_\{n\},E\_\{m\}\)=\\frac\{E\_\{n\}\\cdot E\_\{m\}\}\{\\left\\lVert E\_\{n\}\\right\\rVert\\left\\lVert E\_\{m\}\\right\\rVert\},wheresm,nEs\_\{m,n\}^\{\\rm E\}denotes the semantic similarity, i\.e\., how similar the mobility patterns ofunu\_\{n\}andumu\_\{m\}is\.

Each POI location\(l​a​t,l​o​n\)\(lat,lon\)is encoded into a geohash string at precision=5 using the Geohash algorithm\(Niemeyer,[2008](https://arxiv.org/html/2606.08122#bib.bib14)\), yielding cells of approximately4\.9×4\.94\.9\\times 4\.9km\. Each user’s mobility footprint is then represented as a set of visited geohash cells:

\(5\)𝒢n=\{Geohash​\(xn,i​\(l​a​t\),xn,i​\(l​o​n\)\)∣∀xn,i∈Xn\}\\mathcal\{G\}\_\{n\}=\\\{\\text\{Geohash\}\(x\_\{n,i\}\(lat\),x\_\{n,i\}\(lon\)\)\\mid\\forall x\_\{n,i\}\\in X\_\{n\}\\\}The geographic similarity between two users is computed as the Jaccard index over their cell sets:

\(6\)sm,nG=\|𝒢n∩𝒢m\|\|𝒢n∪𝒢m\|\.s\_\{m,n\}^\{\\rm G\}=\\frac\{\|\\mathcal\{G\}\_\{n\}\\cap\\mathcal\{G\}\_\{m\}\|\}\{\|\\mathcal\{G\}\_\{n\}\\cup\\mathcal\{G\}\_\{m\}\|\}\.
We combine the semantic and geographic similarity with the balancing ratioα\\alphato obtain the final cross\-user similarity:

\(7\)sm,n=α⋅sm,nE\+\(1−α\)⋅sm,nG\.s\_\{m,n\}=\\alpha\\cdot s\_\{m,n\}^\{\\rm E\}\+\(1\-\\alpha\)\\cdot s\_\{m,n\}^\{\\rm G\}\.Therefore we can obtain the similarity matrixS=\{sm,n\}1≤m,n≤NS=\\\{s\_\{m,n\}\\\}\_\{1\\leq m,n\\leq N\}\. We then construct𝒰n\\mathcal\{U\}\_\{n\}by selecting the top\-kkusers with the highest similarity tounu\_\{n\}:

\(8\)𝒰n=\{um∣sm,n∈arg​top−k1≤m≤N,m≠n⁡sm,n\}\.\\mathcal\{U\}\_\{n\}=\\\{u\_\{m\}\\mid s\_\{m,n\}\\in\\operatorname\{arg\\,top\-k\}\_\{1\\leq m\\leq N,m\\neq n\}s\_\{m,n\}\\\}\.We organize the visiting locations of thesekkusers in the time zone of\[tnq−τ,tnq\+τ\]\[t\_\{n\}^\{\\rm q\}\-\\tau,t\_\{n\}^\{\\rm q\}\+\\tau\]as peer behaviors, denoted asPnSP\_\{n\}^\{\\rm S\}\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x3.png)Figure 3\.The user profile \(left\) concludes the mobility patterns with specific temporal and spatial habits\. The inferred intention \(right\) includes the likely categories attnqt\_\{n\}^\{\\rm q\}with multi\-rationale evidence\-grounded reasons fromPnFP\_\{n\}^\{\\rm F\},PnSP\_\{n\}^\{\\rm S\}, andXnqX\_\{n\}^\{\\rm q\}\.Intention Inference\.In existing LLM\-based approaches, the reasoning LLM must simultaneously infer user intentions and rank candidate POIs, which is often limited to selecting the most frequent category rather than performing evidence\-grounded reasoning\. We decouple these two tasks by pre\-inferring a reasoned intention and using it as an explicit reasoning scaffold, thereby guiding the downstream LLM toward intention\-aligned reasoning\.

We prompt a LLMℳI\\mathcal\{M\}\_\{\\rm I\}with rich contexts to generate reasonable intentions, which includes user profile, peer behaviors, and temporal contexts\. Given that users tend to revisit historical locations, the user profilePnFP\_\{n\}^\{\\rm F\}providesℳI\\mathcal\{M\}\_\{\\rm I\}with both coarse\-wise mobility patterns and fine\-wise summary of the visiting frequency aroundtnqt\_\{n\}^\{\\rm q\}\. The peer behaviorsPnSP\_\{n\}^\{\\rm S\}enlarge the spatial horizons and provideℳI\\mathcal\{M\}\_\{\\rm I\}with the locations of similar users\. The query trajectoryXnqX\_\{n\}^\{\\rm q\}provides the temporal context and helps infer the travel purpose from the preceding check\-in records\. We formulate the process of intention inference as:

\(9\)PnI=ℳI​\(Xnq,PnF,PnS\)\.P\_\{n\}^\{\\rm I\}=\\mathcal\{M\}\_\{\\rm I\}\(X\_\{n\}^\{\\rm q\},P\_\{n\}^\{\\rm F\},P\_\{n\}^\{\\rm S\}\)\.
PnIP\_\{n\}^\{\\rm I\}provides a structured description of the user’s inferred traveling intention at the target timetnqt\_\{n\}^\{\\rm q\}\. As shown in Fig\.[3](https://arxiv.org/html/2606.08122#S4.F3)\(right\), it captures the target POI category \(e\.g\.,RestaurantorWine Bar\) that best aligns with the user’s temporal routine and profile patterns, together with a natural\-language reasoning trace that grounds this intention in explicit evidence drawn fromPnFP\_\{n\}^\{\\rm F\},XnqX\_\{n\}^\{\\rm q\}, andPnSP\_\{n\}^\{\\rm S\}\. We provide the detailed prompt ofℳI\\mathcal\{M\}\_\{\\rm I\}in Appendix[B](https://arxiv.org/html/2606.08122#A2)\.

### 4\.2\.Acting: Intention\-Guided Reasoning for POI Prediction

With the user intention built in the thinking stage, we perform intention\-guided reasoning for candidate recommendation in this subsection\. Given that there are thousands of POIs in a city, directly prompting the LLM with all candidates is infeasible\. We therefore design a dual candidate selection strategy that first constructs a compact candidate pool and then applies hybrid filtering to retain the most promising candidates\. After filtering, the reasoning LLM evaluates candidates against the intention and all supporting signals, producing the final ranked recommendations\.

Candidate Pool Construction\.We construct the candidate pool fromhistoricalandspatialperspectives\. Firstly, given the fact that users tend to revisit familiar locations, we constructℋnh\\mathcal\{H\}\_\{n\}^\{\\rm h\}by including all historically\-visited POIs\. In spatial perspective, we constructℋns\\mathcal\{H\}\_\{n\}^\{\\rm s\}, consisting of theZZnearest POIs to the last known location inXnqX\_\{n\}^\{\\rm q\}\. Therefore, we have the candidate poolℋn\\mathcal\{H\}\_\{n\}as

\(10\)ℋn=ℋnh∪ℋns\.\\mathcal\{H\}\_\{n\}=\\mathcal\{H\}\_\{n\}^\{\\rm h\}\\cup\\mathcal\{H\}\_\{n\}^\{\\rm s\}\.ℋn\\mathcal\{H\}\_\{n\}provides a robust candidate pool that considers both visiting repetition and spatial proximity\.

Hybrid Filtering\.To further narrow the candidate pool for efficient recommendation, we apply hybrid filtering from both historical and spatial perspectives to obtain the final candidate poolℋ¯n\\bar\{\\mathcal\{H\}\}\_\{n\}withBBcandidates\. For each historical candidatep∈ℋnhp\\in\\mathcal\{H\}\_\{n\}^\{\\rm h\}, we calculate the visiting scores as:

\(11\)L​\(p\)=cv\+cd\+ch,L\(p\)=c\_\{\\rm v\}\+c\_\{\\rm d\}\+c\_\{\\rm h\},wherecvc\_\{\\rm v\}denotes the number of historical visits ofppbyunu\_\{n\};cdc\_\{\\rm d\}denotes the number of historical visits on the same Day\-of\-Week withtnqt\_\{n\}^\{\\rm q\};chc\_\{\\rm h\}denotes the number of historical visits in the same hour buckets withtnqt\_\{n\}^\{\\rm q\}\. We then selectρ×B\\rho\\times Bcandidates with highest visiting scores to constructℋ¯nh\\bar\{\\mathcal\{H\}\}\_\{n\}^\{\\rm h\}\.

In spatial perspective, we select\(1−ρ\)×B\(1\-\\rho\)\\times Bcandidates with smallest distance to constructℋ¯ns\\bar\{\\mathcal\{H\}\}\_\{n\}^\{\\rm s\}\. Therefore, we can obtain the filtered candidate pool asℋ¯n=ℋ¯nh∪ℋ¯ns\\bar\{\\mathcal\{H\}\}\_\{n\}=\\bar\{\\mathcal\{H\}\}\_\{n\}^\{\\rm h\}\\cup\\bar\{\\mathcal\{H\}\}\_\{n\}^\{\\rm s\}\.

Intention\-Grounded Reasoning\.In contrast to prior methods that feed raw trajectories or the intermediate user profiles into the LLM, we augment the LLMℳR\\mathcal\{M\}\_\{\\rm R\}with the intentionPnIP\_\{n\}^\{\\rm I\}to serve as the reasoning scaffold toward intention\-aligned POI prediction\. We formulate the reasoning process as:

\(12\)𝒴^n=ℳR​\(Xnq,PnF,PnI,ℋ¯n\),\\hat\{\\mathcal\{Y\}\}\_\{n\}=\\mathcal\{M\}\_\{\\rm R\}\\big\(X\_\{n\}^\{\\rm q\},P\_\{n\}^\{\\rm F\},\\,P\_\{n\}^\{\\rm I\},\\,\\bar\{\\mathcal\{H\}\}\_\{n\}\\big\),where𝒴^n=\{y^1,y^2,…,y^T\}\\hat\{\\mathcal\{Y\}\}\_\{n\}=\\\{\\hat\{y\}\_\{1\},\\hat\{y\}\_\{2\},\\ldots,\\hat\{y\}\_\{T\}\\\}denotes the ordered set of recommended POIs for userunu\_\{n\}at the target timetnqt\_\{n\}^\{\\rm q\}, given the query trajectoryXnqX\_\{n\}^\{\\rm q\}\. This grounded reasoning process ensures that each recommendation is supported by explicit and interpretable evidence rather than implicit trajectory patterns alone\.

We present the overall process of IntentPOI in Algorithm[1](https://arxiv.org/html/2606.08122#algorithm1)from the perspective of the operations in the historical trajectories \(i\.e, the training and validation sets in baselines\) and query trajectories \(i\.e\., the test sets in baselines\)\. We first build user profiles and evaluate users’ similarity from historical trajectories, and then infer the intentions and the next POI results for the query trajectories\. Therefore, we have a fair comparison with the baselines, without test data leakage\. More details are provided in Subsection[5\.1](https://arxiv.org/html/2606.08122#S5.SS1)\.

Table 1\.Comparison of different LLM\-based methods for next POI prediction\.DimensionsPrompt\-based MethodsToken\-based MethodsIntentPOI\(Ours\)Reasoning ParadigmOne\-step re\-rankingOne\-step index mappingTwo\-stage,Thinking→\\rightarrowActingIntention ModelingImplicitImplicitExplicit,intention as reasoning scaffoldEvidence SourcesHistorical\-orientedToken semanticsMulti\-source,including peer behaviorsInterpretabilityLimitedLimitedHigh,evidence\-grounded reasoning tracesTable 2\.Statistics of the datasets\.Datasets\#Users\#POIs\#Check\-ins\#TrajectoriesNYC10755,099104,07414,160TKY2,2817,844361,43044,692CA4,3189,923250,78032,920
### 4\.3\.Comparative Analysis

To further clarify the design rationale of IntentPOI, we provide an explicit comparison with two main LLM\-based paradigms, i\.e\., prompt\-based and token\-based methods, as presented in Table[1](https://arxiv.org/html/2606.08122#S4.T1)\.

Prompt\-based methods employ LLMs as one\-step re\-rankers\(Wanget al\.,[2023](https://arxiv.org/html/2606.08122#bib.bib21); Zhonget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib26); Liet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib23)\)and token\-based methods similarly collapse reasoning into a single step, mapping token\-encoded query trajectories directly to POI indices\(Chenet al\.,[2025b](https://arxiv.org/html/2606.08122#bib.bib27); Chenget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib24); Wanget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib25)\)\. ➊ Both paradigms ask the LLM to simultaneously infer what the user wants and which POI matches that need, without an explicit intermediate reasoning step\. In contrast, IntentPOI explicitly infers the intermediate intention, which serves as the reasoning scaffold for the LLMℳR\\mathcal\{M\}\_\{\\rm R\}to yield intention\-aligned recommendation results\. ➋ While both prompt\-based and token\-based methods derive evidence exclusively from a single user’s trajectory patterns\. IntentPOI expands the evidence base to cross\-user mobility patterns\. ➌ Moreover, in IntentPOI, each recommendation is supported by a full reasoning trace fromPnFP\_\{n\}^\{\\rm F\},PnIP\_\{n\}^\{\\rm I\}, andXnqX\_\{n\}^\{\\rm q\}, making the decision process auditable\. In summary, our proposed IntentPOI shifts the LLM from a one\-step trajectory\-conditioned predictor to a two\-stage intention\-guided reasoner, where explicit intention inference bridges the gap between raw mobility signals and grounded location recommendations\.

Table 3\.Performance comparison on NYC, TKY, and CA datasets\.Bold: the best\.Underline: the second best\. “Improve” denotes the relative improvement ofIntentPOIover the best baseline\.NYCTKYCAMethodsHR@5HR@10N@5N@10MRRHR@5HR@10N@5N@10MRRHR@5HR@10N@5N@10MRRDeepMove0\.31690\.36510\.24560\.26140\.22850\.38550\.45850\.29650\.32010\.27670\.28610\.33770\.20910\.22570\.1903GETNext0\.29730\.36840\.21110\.23410\.19210\.16540\.19340\.13280\.14190\.12570\.21530\.23730\.16370\.17090\.1497SASRec0\.20980\.23650\.16130\.17010\.14890\.33380\.39250\.24940\.26850\.22940\.18290\.21460\.13670\.14700\.1256BERT4Rec0\.20830\.24610\.15840\.17080\.14700\.32720\.40050\.24360\.26730\.22560\.18840\.22560\.14020\.15200\.1290FPMC0\.30100\.33400\.22960\.24030\.21020\.44250\.52610\.34290\.37000\.32100\.25030\.30610\.19350\.21190\.1823POI\-GDE0\.22650\.26240\.17160\.18320\.15820\.33880\.40630\.25840\.28020\.24060\.18780\.22830\.14110\.15420\.1310AgentMove0\.40730\.48930\.30080\.32720\.27620\.42400\.52380\.30750\.34000\.28230\.36310\.43050\.28780\.30950\.2716CoMaPOI0\.36430\.44110\.26700\.29160\.24480\.38920\.48810\.28400\.31600\.26240\.36450\.45800\.26530\.29570\.2451LLM4POI0\.26200\.30100\.20230\.21480\.18760\.31210\.36020\.24110\.25670\.22390\.24420\.27650\.19050\.20110\.1770MobilityLLM0\.40750\.49200\.31120\.33880\.29780\.45190\.53910\.34310\.37140\.32760\.30870\.37990\.23570\.25860\.2296QT\-Mob0\.30620\.33580\.25530\.26500\.24230\.40050\.44660\.32730\.34230\.30920\.26410\.30950\.21780\.23280\.2087IntentPOI0\.42740\.50630\.33320\.35860\.31230\.52720\.62650\.37260\.40450\.33440\.40720\.48510\.31520\.33720\.2937Improve \(%\)4\.882\.917\.075\.844\.8716\.6616\.218\.608\.912\.0811\.715\.929\.528\.958\.14

## 5\.Experiments

### 5\.1\.Experimental Settings

Datasets\.We conduct experiments on three widely\-used real\-world POI check\-in datasets: NYC, TKY, and CA\. The NYC dataset includes check\-in records collected in New York City from April 2012 to February 2013\. The TKY dataset covers Tokyo during the same period\(Yanget al\.,[2014](https://arxiv.org/html/2606.08122#bib.bib35)\)\. The CA dataset includes check\-ins across California and Nevada, from February 2009 to October 2010\(Yuanet al\.,[2013](https://arxiv.org/html/2606.08122#bib.bib36)\)\. Following prior works\(Zhonget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib26); Yanget al\.,[2022a](https://arxiv.org/html/2606.08122#bib.bib40)\), we filter out users and POIs with fewer than 10 check\-ins\. The statistical details of the processed datasets are presented in Table[2](https://arxiv.org/html/2606.08122#S4.T2)\. For the baselines, these datasets are divided into training, validation, and test sets in chronological order with the ratio of 8:1:1\. For fair comparison, in our proposed IntentPOI, we adopt the training and validation sets \(i\.e\., the former 90% trajectories\) to build the user profiles and similarity matrix, and report the prediction performance on the test set\.

Baselines\.We compare IntentPOI with a comprehensive collection of baselines, includingdeep learning\-based methods: DeepMove\(Fenget al\.,[2018a](https://arxiv.org/html/2606.08122#bib.bib16)\), GETNext\(Yanget al\.,[2022a](https://arxiv.org/html/2606.08122#bib.bib40)\), SASRec\(Kang and McAuley,[2018](https://arxiv.org/html/2606.08122#bib.bib37)\), BERT4Rec\(Sunet al\.,[2019](https://arxiv.org/html/2606.08122#bib.bib38)\), FPMC\(Rendleet al\.,[2010](https://arxiv.org/html/2606.08122#bib.bib78)\), and POI\-GDE\(Yanget al\.,[2024a](https://arxiv.org/html/2606.08122#bib.bib44)\); andLLM\-based methods: AgentMove\(Fenget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib79)\), CoMaPOI\(Zhonget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib26)\), LLM4POI\(Liet al\.,[2024](https://arxiv.org/html/2606.08122#bib.bib23)\), MobilityLLM\(Gonget al\.,[2024a](https://arxiv.org/html/2606.08122#bib.bib80)\), and QT\-Mob\(Chenet al\.,[2025b](https://arxiv.org/html/2606.08122#bib.bib27)\)\.

Evaluation Metrics\.Following prior works\(Zhonget al\.,[2025](https://arxiv.org/html/2606.08122#bib.bib26); Yanget al\.,[2022a](https://arxiv.org/html/2606.08122#bib.bib40); Chenet al\.,[2025b](https://arxiv.org/html/2606.08122#bib.bib27)\), we adopt standard ranking\-based metrics to evaluate the recommendation performance\. Hit Rate atnn\(HR@nn\) measures the proportion of test cases where the ground\-truth POI appears among the top\-nnpredictions\. Normalized Discounted Cumulative Gain atnn\(N@nn\) further accounts for the ranking quality by assigning higher weights to correct predictions at top positions\. Mean Reciprocal Rank \(MRR\) evaluates the average reciprocal rank of the first correct answer\.

Implementation Details\.We employGPT\-5\.4222https://openai\.com/index/introducing\-gpt\-5\-4/asℳF\\mathcal\{M\}\_\{\\rm F\}andℳI\\mathcal\{M\}\_\{\\rm I\}to generate user profiles and intentions respectively\. We employGPT\-5\.4\-mini333https://openai\.com/index/introducing\-gpt\-5\-4\-mini\-and\-nano/asℳR\\mathcal\{M\}\_\{\\rm R\}to perform intention\-guided reasoning\. We adopttext\-embedding\-3\-large444https://openai\.com/index/new\-embedding\-models\-and\-api\-updates/as the text embedding modelΦ\\Phi\. The hyperparameters are set as follows:k=5k=5for peer selection,α=0\.5\\alpha=0\.5for similarity fusion,τ=30\\tau=30min for temporal window,Z=50Z=50for spatial candidate construction,B=30B=30for candidate pool size,T=10T=10for candidate recommendation, andρ=0\.9\\rho=0\.9for hybrid filtering ratio\. All experiments are conducted on a server with 4 NVIDIA A6000 GPUs\. The source code can be accessed online555https://github\.com/yuppielqx/Next\-POI\.

### 5\.2\.Main Results

We present the performance comparison on the three datasets in Table[3](https://arxiv.org/html/2606.08122#S4.T3)\. It is evident that IntentPOI consistently outperforms all baselines across all metrics and datasets, particularly on TKY and CA datasets, with HR@5 improved by16\.66%\\mathbf\{16\.66\}\\%and11\.71%\\mathbf\{11\.71\}\\%respectively\. The performance gains indicate that explicit intention reasoning provides a robust inductive bias that adapts to cities with different mobility patterns and POI densities\. LLM\-based methods generally outperform deep learning\-based approaches, indicating that the semantic understanding capabilities of LLMs are beneficial for modeling complex human mobility patterns\. Among the LLM\-based baselines, MobilityLLM and AgentMove achieve competitive results\. However, both methods still operate as one\-step predictors without explicit intention modeling, and their performance gap relative to IntentPOI increases on datasets with more diverse user behaviors, such as TKY and CA\.

In conclusion, these results demonstrate the effectiveness of the proposed thinking\-then\-acting paradigm\. By explicitly inferring user intentions as an intermediate reasoning step, IntentPOI transforms next POI prediction from frequency\-dominated pattern matching into intention\-guided reasoning, leading to consistent and substantial improvements over state\-of\-the\-art methods across diverse real\-world scenarios\.

### 5\.3\.Model Analysis

Ablation in IntentPOI\.We present five ablation variants of IntentPOI in Table[4](https://arxiv.org/html/2606.08122#S5.T4), where each variant selectively removes one component fromℳI\\mathcal\{M\}\_\{\\rm I\}orℳR\\mathcal\{M\}\_\{\\rm R\}\. Moreover, we also ablate the process of candidate pool construction into three variants: \(i\) adopting the random filtering strategy, \(ii\) including only spatial candidates, and \(iii\) including only historical candidates\.

We report the ablation results in Table[5](https://arxiv.org/html/2606.08122#S5.T5)\. IntentPOI consistently outperforms all eight variants\. ➊ The impact of removing a signal depends critically on which LLM it feeds\. Ablating the user profile fromℳI\\mathcal\{M\}\_\{\\rm I\}results in remarkable performance drop, with the HR@1 decreasing by49\.8%49\.8\\%\(from 0\.211 to 0\.106\)\. While removing the same profile signal fromℳR\\mathcal\{M\}\_\{\\rm R\}produces only a marginal decline, with the HR@1 decreasing by3\.4%3\.4\\%\. This asymmetry directly validates that the user profile is the foundational evidence for the thinking stage, but the intention can compensate for its absence in the acting stage\. ➋ Removing the intention signal fromℳR\\mathcal\{M\}\_\{\\rm R\}results in11\.2%11\.2\\%HR@1 drop, indicating that the intention serves as the primary reasoning scaffold in the acting stage\. ➌ Removing peer behaviors fromℳI\\mathcal\{M\}\_\{\\rm I\}produces9\.5%9\.5\\%HR@1 degradation, indicating that cross\-user evidence plays a supplementary role in intention inference\.

➍Random filtering causes20\.4%20\.4\\%HR@1 drop, confirming that the hybrid filtering strategy effectively identifies relevant candidates for the final recommendation\. ➎ Removing historical candidates causes catastrophic collapse, with HR@1 decreasing from 0\.211 to 0\.010, confirming that re\-visiting patterns dominate the POI prediction\. ➏ In contrast, removing spatial candidates leads to moderate degradation, with HR@1 decreasing by2\.4%2\.4\\%, indicating that spatial proximity serves as a useful but secondary signal for the final recommendation\.

Table 4\.Ablation variants ofIntentPOI\.VariantsℳI\\mathcal\{M\}\_\{\\rm I\}ℳR\\mathcal\{M\}\_\{\\rm R\}PnFP\_\{n\}^\{\\rm F\}PnSP\_\{n\}^\{\\rm S\}PnFP\_\{n\}^\{\\rm F\}PnIP\_\{n\}^\{\\rm I\}IntentPOI✔✔✔✔A\.1 \(w/oPnFP\_\{n\}^\{\\rm F\}inℳI\\mathcal\{M\}\_\{\\rm I\}\)✘✔✔✔A\.2 \(w/oPnSP\_\{n\}^\{\\rm S\}inℳI\\mathcal\{M\}\_\{\\rm I\}\)✔✘✔✔A\.3 \(w/oPnFP\_\{n\}^\{\\rm F\}inℳR\\mathcal\{M\}\_\{\\rm R\}\)✔✔✘✔A\.4 \(w/oPnIP\_\{n\}^\{\\rm I\}inℳR\\mathcal\{M\}\_\{\\rm R\}\)✔✔✔✘A\.5 \(w/o All\)✘✘✘✘Table 5\.Ablation study inIntentPOIon CA dataset\.Bold: the best\.Underline: the second best\.MetricsHR@1HR@5HR@10N@1N@5N@10MRRIntentPOI0\.2110\.4070\.4850\.2110\.3150\.3370\.294A\.10\.1060\.2920\.4040\.1060\.2050\.2410\.191A\.20\.1910\.3970\.4750\.1910\.3020\.3310\.282A\.30\.2040\.3990\.4730\.2040\.3070\.3310\.286A\.40\.1860\.3610\.4550\.1860\.2770\.3060\.261A\.50\.1010\.2820\.3780\.1010\.1960\.2270\.180random0\.1680\.2890\.3360\.1680\.2330\.2480\.220w/oℋns\\mathcal\{H\}\_\{n\}^\{\\rm s\}0\.2060\.4000\.4710\.2060\.3090\.3330\.289w/oℋnh\\mathcal\{H\}\_\{n\}^\{\\rm h\}0\.0100\.0320\.0430\.0100\.0220\.0250\.019We define asuccessful hitas if the ground\-truth POI is included in the candidate pool\. Therefore, we can calculate the average hit rate of the candidate pools across all query trajectories\. To further evaluate the efficiency of the hybrid filtering strategy, we introduce*coverage efficiency*as the average hit rate in different filtering strategies divided by the pool size\. As shown in Fig\.[4](https://arxiv.org/html/2606.08122#S5.F4), our proposed hybrid filtering achieves the highest coverage efficiency in each candidate pool across the two datasets\. Specifically, it exceeds the raw full candidate pool \(i\.e\.,ℋn\\mathcal\{H\}\_\{n\}\) by3\.0×3\.0\\timeson NYC and3\.3×3\.3\\timeson CA, and outperforms random filtering by18%18\\%and47%47\\%respectively on the full candidate pool\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x4.png)Figure 4\.Coverage efficiency of different candidate filtering strategies on \(a\) NYC and \(b\) CA datasets\.Table[6](https://arxiv.org/html/2606.08122#S5.T6)further reports the results on sparse users \(having less than 5 historical trajectories, i\.e\.,H≤5H\\leq 5\)\. IntentPOI consistently outperforms the five variants\. Under sparse settings, the stage\-dependent asymmetry persists but with a notable shift\. ➊ A\.1 remains the most damaging ablation, indicating that the user profile is the foundational evidence for intention inference, and its importance is decreased when the historical trajectories offer sparse temporal contexts, with HR@1 decreasing by 42\.72% \(from 0\.206 to 0\.118\)\. ➋ Comparing A\.2 in the full\-test and sparse settings, we have a key observation that the peer behaviors play a more critical role for sparse users, with the HR@1 decreasing by14\.6%14\.6\\%under sparse settings compared to9\.5%9\.5\\%in the full\-test setting\. ➌ Moreover, the reasoning scaffold provided by intention becomes more critical forℳR\\mathcal\{M\}\_\{\\rm R\}in sparse settings, compared with 11\.2% HR@1 decrease in Table[5](https://arxiv.org/html/2606.08122#S5.T5)and 18\.93% in Table[6](https://arxiv.org/html/2606.08122#S5.T6)\.

Table 6\.Ablation results on query trajectories of sparse users on CA dataset\.Bold: the best\.Underline: the second best\.MetricsHR@1HR@5HR@10N@1N@5N@10MRRIntentPOI0\.2060\.3980\.4750\.2060\.3090\.3340\.290A\.10\.1180\.2550\.3430\.1180\.1880\.2160\.177A\.20\.1760\.3330\.4120\.1760\.2610\.2860\.247A\.30\.1760\.3330\.3820\.1760\.2630\.2770\.244A\.40\.1670\.2550\.3820\.1670\.2130\.2530\.215A\.50\.0980\.2350\.2940\.0980\.1660\.1850\.152Ablation in Pretrained LLMs\.Fig\.[5](https://arxiv.org/html/2606.08122#S5.F5)reports MRR under varying selections of the generation LLMs \(ℳF\\mathcal\{M\}\_\{\\rm F\}andℳI\\mathcal\{M\}\_\{\\rm I\}\) in thinking stage and the reasoning LLM \(ℳR\\mathcal\{M\}\_\{\\rm R\}\) in acting stage, in both full and sparse test settings\.

IntentPOI is substantially more sensitive to the LLM choice for generation than for reasoning\. ➊ In Fig\.[5](https://arxiv.org/html/2606.08122#S5.F5)\(a\), MRR spans a wide range across different implementations, which indicates that stronger signal generators produce more accurate user profiles and intentions, thus promoting the final prediction performance quality\. ➋ While in Fig\.[5](https://arxiv.org/html/2606.08122#S5.F5)\(b\), the performance difference is relatively small\. This asymmetry indicates that the thinking stage does the difficult reasoning to establish a high\-quality intention, and the acting stage becomes a light evaluation task that even a moderate LLM can perform effectively\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x5.png)Figure 5\.Full and sparse test performance in different implementations of \(a\)ℳF\\mathcal\{M\}\_\{\\rm F\},ℳI\\mathcal\{M\}\_\{\\rm I\}, and \(b\)ℳR\\mathcal\{M\}\_\{\\rm R\}\.Efficiency Analysis\.To assess the inference cost of each ablation variant in Table[4](https://arxiv.org/html/2606.08122#S5.T4), we measure the input token count and latency ofℳI\\mathcal\{M\}\_\{\\rm I\}andℳR\\mathcal\{M\}\_\{\\rm R\}on the CA dataset, averaged across all test queries\. Note that in Fig\.[6](https://arxiv.org/html/2606.08122#S5.F6), we only report the measurements of LLMs which have ablation variants\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x6.png)Figure 6\.Inference cost ofℳI\\mathcal\{M\}\_\{\\rm I\}andℳR\\mathcal\{M\}\_\{\\rm R\}across ablation variants in terms of input token count and latency\.♠​\[♡\]\\spadesuit\[\\heartsuit\]denotes the inference cost of♡\\heartsuitin variant♠\\spadesuit\.ℳI\\mathcal\{M\}\_\{\\rm I\}andℳR\\mathcal\{M\}\_\{\\rm R\}exhibit a consistent cost asymmetry across all variants\. ➊ The input tokens ofℳI\\mathcal\{M\}\_\{\\rm I\}are relatively fewer and are prompted by the user profile, temporal contexts, and peer behaviors, resulting higher latency due to complicated intention generation\. ➋ℳR\\mathcal\{M\}\_\{\\rm R\}consumes more input tokens for the candidate poolℋ¯n\\bar\{\\mathcal\{H\}\}\_\{n\}and all reasoning signals, but runs faster since it performs discriminative ranking over structured candidates rather than open\-ended generation\. This indicates that the thinking stage performs a high\-quality but concentrated reasoning step, while the acting stage performs a broader but mechanically lighter evaluation\. ➌ Removing individual signals can only yield marginal token savings for both LLMs, as the dominant token cost comes from the query trajectory and candidate pool, which remain constant across variants\. ➍ While A\.5 eliminates theℳI\\mathcal\{M\}\_\{\\rm I\}call and reducesℳR\\mathcal\{M\}\_\{\\rm R\}’s input tokens, this saving results in catastrophic accuracy collapse, as shown in Table[5](https://arxiv.org/html/2606.08122#S5.T5)and Table[6](https://arxiv.org/html/2606.08122#S5.T6)\. In conclusion, the efficiency analysis validates that the moderate inference cost of IntentPOI is necessary for the performance gains\.

Hyperparameter Investigation\.We evaluate the performance variance of IntentPOI under different hyperparameter settings\. Fig\.[7](https://arxiv.org/html/2606.08122#S5.F7)reports the numerical results of four metrics on CA dataset w\.r\.t four key hyperparameters, i\.e\.,ρ\\rho,kk,τ\\tau, andα\\alpha\. Since theZZspatial candidates are generated solely based on geographic distance, while only the top\(ρ×B\)\(\\rho\\times B\)candidates are retained after hybrid filtering, we do not investigate the impact ofZZon the final performance\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x7.png)Figure 7\.Performance under difference settings of \(a\) historical candidate ratioρ\\rho, \(b\) number of similar userskk, \(c\) temporal windowτ\\tau\(minutes\), and \(d\) semantic\-geographic balancing ratioα\\alpha\.![Refer to caption](https://arxiv.org/html/2606.08122v1/x8.png)Figure 8\.Effect of candidate pool sizeBBon MRR and average hit rate of the filtered candidate poolℋ¯n\\bar\{\\mathcal\{H\}\}\_\{n\}\.➊In Fig\.[7](https://arxiv.org/html/2606.08122#S5.F7)\(a\), the performance improves asρ\\rhoincreases, confirming that historical candidates are more informative than spatial candidates for next POI prediction, as they directly capture the user’s re\-visiting patterns\. However, the performance gain disappears afterρ=0\.9\\rho=0\.9, indicating that spatial candidates compensate for the long\-tail POIs that are not captured by historical candidates\.

➋The candidate pool sizeBBaffects the trade\-off between accuracy and efficiency\. As shown in Fig\.[8](https://arxiv.org/html/2606.08122#S5.F8), largerBBretains more candidates in the filtered poolℋ¯n\\bar\{\\mathcal\{H\}\}\_\{n\}, increasing the hit rate of the ground truth, but also increasesℳR\\mathcal\{M\}\_\{\\rm R\}’s input length and inference cost\. The default settingB=30B=30achieves a good balance between the converged MRR and moderate token consumption\.

➌We vary the number of similar userskkfrom 1 to 10, which affects the richness of the peer behaviors for intention inference\. As shown in Fig\.[7](https://arxiv.org/html/2606.08122#S5.F7)\(b\), lower or higher values ofkkmay introduce noise or bias\. Too few peers may not provide representative samples of behavior patterns, while too many peers may include dissimilar users whose check\-in patterns are less irrelevant to the target user\.

➍Different values ofτ\\tauaffect the temporal relevance of the peer check\-ins used for intention inference\. As shown in Fig\.[7](https://arxiv.org/html/2606.08122#S5.F7)\(c\), the narrow temporal window \(τ=15\\tau=15min\) may miss relevant check\-ins from users whose mobility patterns are slightly offset from the target user’s schedule, while too wide window \(τ=60\\tau=60min\) dilutes the temporal specificity of the peer evidence\.

➎In Fig\.[7](https://arxiv.org/html/2606.08122#S5.F7)\(d\), we vary the balancing ratioα\\alphafrom 0 \(i\.e\., only geographic similarity\) to 1 \(i\.e\., only semantic similarity\)\. All metrics peak atα=0\.5\\alpha=0\.5and degrades with lower or higher values, confirming that the two similarity signals are complementary, with semantic similarity for mobility pattern alignment and geographic similarity for shared spatial footprints\.

➏Across all four parameters, the default configuration isρ=0\.9\\rho=0\.9,k=20k=20,τ=30\\tau=30, andα=0\.5\\alpha=0\.5\. All metrics vary smoothly within a reasonable range around each optimum rather than exhibiting sharp cliffs, which indicates that IntentPOI is not sensitive to hyperparameter selection, making the framework robust to dataset\-specific tuning in practical applications\.

### 5\.4\.Case Study

In this subsection, we showcase two samples from the CA dataset to illustrate the success of IntentPOI enhanced by the explicit intention and failure on the out\-of\-distribution trajectory\.

Sucessful Case\.We select the query trajectory𝟸𝟿\\mathtt\{29\}from User𝟿\\mathtt\{9\}, which is a sparse case with 5 historical trips and the query trajectory has only 2 check\-ins\. The user visits diverse categories and the top category \(Disneyland Resort\) accounts for only 17\.4% of visits, making it a challenging case where no single signal can dominate\. We compare the prediction results of three variants in Fig\.[9](https://arxiv.org/html/2606.08122#S5.F9)\.

![Refer to caption](https://arxiv.org/html/2606.08122v1/x9.png)Figure 9\.A successful case study on query trajectory𝟸𝟿\\mathtt\{29\}\. The historical summary, inferred intention and reasoning details on four variants are provided\.![Refer to caption](https://arxiv.org/html/2606.08122v1/x10.png)Figure 10\.A failure case on query trajectory𝟷𝟶𝟻𝟸\\mathtt\{1052\}\. The historical summary, inferred intention and reasoning details ofIntentPOIare provided\.The inferred intention in IntentPOI correctly identifies the user’s likely activity as anevening leisure outinginstead ofstaying around stadium, and constrains the candidate ranking to leisure\-relevant categories, which is critical for correctly nominating theBeachcategory\. This is based on the profile evidence of the user’s leisure pattern in Orange County, the temporal context that Friday evening is a peak entertainment window, and the peer behavior evidence that similar users visit diverse non\-Disneyland categories on Friday evenings\. Based on such, the IntentPOI correctly ranks the ground\-truth Beach POI \(POI𝟼𝟾𝟾\\mathtt\{688\}\) at position 1\.

The three ablation variants fail with distinct reasons\. ➊ A\.1 \(w/oPnFP\_\{n\}^\{\\rm F\}inℳI\\mathcal\{M\}\_\{\\rm I\}\) fails to learn about the user’s long\-term mobility patterns, and the generated intention lacks spatial grounding\. Therefore,ℳR\\mathcal\{M\}\_\{\\rm R\}consequently defaults to pure spatial distance ranking, where all top\-5 candidates are POIs near the Stadium, and the distant Beach POI falls to rank 9\. ➋ In A\.2 \(w/oPnSP\_\{n\}^\{\\rm S\}inℳI\\mathcal\{M\}\_\{\\rm I\}\), the top\-5 candidates all become the most\-visited Disneyland Resort\. However, the user’s own behavior is not Disneyland\-dominated\. Without cross\-user diversity evidence to counterbalance the raw frequency signal, the intention is inferred purely from the user’s own history, andℳR\\mathcal\{M\}\_\{\\rm R\}cannot distinguish between overall prevalence and time\-specific relevance\. ➌ A\.4 \(w/oPnIP\_\{n\}^\{\\rm I\}inℳR\\mathcal\{M\}\_\{\\rm R\}\) produces a superficially similar Disneyland collapse, but the underlying cause is distinct\. Without the intention scaffold, the temporal constraint “09\-24 20:18→\\rightarrowleisure” is absent fromℳR\\mathcal\{M\}\_\{\\rm R\}’s reasoning, and the LLM defaults to unconditional frequency ranking without category\-level guidance\. ➍ Overall, the successful case indicates that multi\-rationale signals in IntentPOI provide complementary rather than redundant reasoning priors, and that signal diversity is the key mechanism driving correct predictions\.

Failure Case\.However, when the profile and peer behavior signals collectively fail to support the ground truth, the inferred intentions can hardly serve as an effective reasoning scaffold\. We present a representative failure case in Fig\.[10](https://arxiv.org/html/2606.08122#S5.F10)to illustrate this boundary\. We select query trajectory𝟷𝟶𝟻𝟸\\mathtt\{1052\}\(User𝟹𝟼𝟹\\mathtt\{363\}\) from the CA dataset\. This user has 10 historical trajectories and 72 check\-ins, with behavior overwhelmingly dominated by Starbucks \(61\.1%\)\. The query trajectory places the user at two Starbucks locations in repeated alternation over six of the seven check\-ins\. The ground\-truth POI is aMall\(ID:𝟼𝟺𝟸𝟹\\mathtt\{6423\}\) located 13 km north of the user’s Starbucks cluster\. This instance exhibits three simultaneous deviations from the user’s historical pattern\. \(i\) Category: Mall accounts for only a single historical visit vs\. 61\.1% Starbucks; \(ii\) Temporal: the user has only 10 check\-ins in the time scope of 00:00–02:00, scattered across 10 different POIs with no concentrated pattern; \(iii\) Spatial: the ground\-truth POI lies at the northern extreme of the user’s activity range\.

The user profile captures the user’s Starbucks\-dominated identity, but the Mall signal \(only a single historical visit on a Monday midnight\) is too weak\. The peer behavior retrieves similar users who at this hour predominantly visit Pubs, Bars, and Nightlife venues, instead of Malls\. Therefore, the intention can correctly identify the late\-night context and predict likely categories as Starbucks, coffee shop, bar, and entertainment, but excludes Mall\. The final explanation of the Mall candidate is “a mall visit is less likely at this hour but fits the user’s occasional retail pattern”\. The LLM simultaneously acknowledges the partial fit and the temporal implausibility, and with no signal providing positive support, this weak signal is overwhelmed by the Starbucks and nightlife evidence\.

## 6\.Conclusion and Future Works

Given the insight that users typically form an traveling intention before selecting a specific destination, we argue that the intention inference should be a critical intermediate reasoning step in next POI prediction task\. Therefore, we propose IntentPOI, a two\-stage intention\-guided reasoning framework that first infers user intentions from historical mobility patterns, peer behaviors, and query contexts, and then performs intention\-guided POI recommendation\. By explicitly incorporating intention as a reasoning scaffold, IntentPOI transforms next POI prediction into a more interpretable reasoning process\. Extensive experiments on three real\-world datasets demonstrate that IntentPOI consistently outperforms state\-of\-the\-art baselines, while ablation studies verify the effectiveness of explicit intention reasoning\.

Limitations and Future Works\.Despite its promising performance, IntentPOI still relies on LLM\-generated intentions, whose quality may affect downstream recommendations\. In addition, the current framework focuses on short\-term intentions and does not explicitly model their evolution over time\. Future work will explore intention\-aware mobility modeling with more efficient models and investigate dynamic intention evolution for broader mobility prediction tasks\.

###### Acknowledgements\.

We used ChatGPT to polish the sentences and improve the overall readability of the text\. Additionally, as a core component of the proposed IntentPOI, we utilized theGPT\-5\.4API as

ℳF\\mathcal\{M\}\_\{\\rm F\}and

ℳI\\mathcal\{M\}\_\{\\rm I\}to generate the user profiles and intentions, and theGPT\-5\.4\-miniAPI as

ℳR\\mathcal\{M\}\_\{\\rm R\}to infer the specific POIs\.

## References

- Y\. Chen, W\. Huang, K\. Zhao, Y\. Jiang, and G\. Cong \(2025a\)Self\-supervised representation learning for geospatial objects: a survey\.Information Fusion123,pp\. 103265\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- Y\. Chen, Y\. Tao, Y\. Jiang, S\. Liu, H\. Yu, and G\. Cong \(2025b\)Enhancing large language models for mobility analytics with semantic location tokenization\.InSIGKDD,pp\. 262–273\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p2.1),[§2](https://arxiv.org/html/2606.08122#S2.p2.1),[§4\.3](https://arxiv.org/html/2606.08122#S4.SS3.p2.4),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p3.5)\.
- J\. Cheng, J\. Wang, Y\. Zhang, J\. Ji, Y\. Zhu, Z\. Zhang, and X\. Zhao \(2025\)Poi\-enhancer: an llm\-based semantic enhancement framework for poi representation learning\.InAAAI,Vol\.39,pp\. 11509–11517\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p2.1),[§4\.3](https://arxiv.org/html/2606.08122#S4.SS3.p2.4)\.
- Y\. Dang, E\. Yang, G\. Guo, L\. Jiang, X\. Wang, X\. Xu, Q\. Sun, and H\. Liu \(2023\)Uniform sequence better: time interval aware data augmentation for sequential recommendation\.InProceedings of the AAAI conference on artificial intelligence,Vol\.37,pp\. 4225–4232\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- J\. Feng, Y\. Du, J\. Zhao, and Y\. Li \(2025\)Agentmove: a large language model based agentic framework for zero\-shot next location prediction\.InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),pp\. 1322–1338\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- J\. Feng, Y\. Li, C\. Zhang, F\. Sun, F\. Meng, A\. Guo, and D\. Jin \(2018a\)Deepmove: predicting human mobility with attentional recurrent networks\.InThe world wide web conference,pp\. 1459–1468\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- J\. Feng, Y\. Li, C\. Zhang, F\. Sun, F\. Meng, A\. Guo, and D\. Jin \(2018b\)Deepmove: predicting human mobility with attentional recurrent networks\.InProceedings of the 2018 world wide web conference,pp\. 1459–1468\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- S\. Feng, H\. Lyu, F\. Li, Z\. Sun, and C\. Chen \(2024\)Where to move next: zero\-shot generalization of llms for next poi recommendation\.InCAI,pp\. 1530–1535\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p2.1),[§2](https://arxiv.org/html/2606.08122#S2.p2.1)\.
- Q\. Gao, F\. Zhou, G\. Trajcevski, K\. Zhang, T\. Zhong, and F\. Zhang \(2019\)Predicting human mobility via variational attention\.InThe world wide web conference,pp\. 2750–2756\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- L\. Gong, Y\. Lin, X\. Zhang, Y\. Lu, X\. Han, Y\. Liu, S\. Guo, Y\. Lin, and H\. Wan \(2024a\)Mobility\-llm: learning visiting intentions and travel preference from human mobility data with large language models\.Advances in Neural Information Processing Systems37,pp\. 36185–36217\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- L\. Gong, Y\. Lin, X\. Zhang, Y\. Lu, X\. Han, Y\. Liu, S\. Guo, Y\. Lin, and H\. Wan \(2024b\)Mobility\-llm: learning visiting intentions and travel preference from human mobility data with large language models\.Advances in Neural Information Processing Systems37,pp\. 36185–36217\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p2.1)\.
- N\. Jiang, H\. Yuan, J\. Si, M\. Chen, and S\. Wang \(2024\)Towards effective next poi prediction: spatial and semantic augmentation with remote sensing data\.InICDE,pp\. 5061–5074\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- W\. Kang and J\. McAuley \(2018\)Self\-attentive sequential recommendation\.InICDM,pp\. 197–206\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- Y\. Lai, Y\. Su, L\. Wei, T\. He, H\. Wang, G\. Chen, D\. Zha, Q\. Liu, and X\. Wang \(2024\)Disentangled contrastive hypergraph learning for next poi recommendation\.InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval,pp\. 1452–1462\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- P\. Li, M\. de Rijke, H\. Xue, S\. Ao, Y\. Song, and F\. D\. Salim \(2024\)Large language models for next point\-of\-interest recommendation\.InSIGIR,pp\. 1463–1472\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p2.1),[§2](https://arxiv.org/html/2606.08122#S2.p2.1),[§4\.3](https://arxiv.org/html/2606.08122#S4.SS3.p2.4),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- Y\. Li, T\. Chen, Y\. Luo, H\. Yin, and Z\. Huang \(2021\)Discovering collaborative signals for next poi recommendation with iterative seq2graph augmentation\.arXiv preprint arXiv:2106\.15814\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- N\. Lim, B\. Hooi, S\. Ng, X\. Wang, Y\. L\. Goh, R\. Weng, and J\. Varadarajan \(2020a\)STP\-udgat: spatial\-temporal\-preference user dimensional graph attention network for next poi recommendation\.InProceedings of the 29th ACM International conference on information & knowledge management,pp\. 845–854\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- N\. Lim, B\. Hooi, S\. Ng, X\. Wang, Y\. L\. Goh, R\. Weng, and J\. Varadarajan \(2020b\)STP\-udgat: spatial\-temporal\-preference user dimensional graph attention network for next poi recommendation\.InProceedings of the 29th ACM International conference on information & knowledge management,pp\. 845–854\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- M\. Luca, G\. Barlacchi, B\. Lepri, and L\. Pappalardo \(2021\)A survey on deep learning for human mobility\.ACM Computing Surveys \(CSUR\)55\(1\),pp\. 1–44\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- Y\. Luo, Q\. Liu, and Z\. Liu \(2021\)Stan: spatio\-temporal attention network for next location recommendation\.InProceedings of the web conference 2021,pp\. 2177–2185\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- D\. Lv, Q\. Ding, H\. Xu, Z\. Sun, Z\. Wang, F\. Xiong, and M\. Xu \(2026\)Reasoning over space: enabling geographic reasoning for llm\-based generative next poi recommendation\.arXiv preprint arXiv:2601\.04562\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p2.1)\.
- G\. Niemeyer \(2008\)Geohash\.External Links:[Link](http://geohash.org/)Cited by:[§4\.1](https://arxiv.org/html/2606.08122#S4.SS1.p5.2)\.
- X\. Rao, L\. Chen, Y\. Liu, S\. Shang, B\. Yao, and P\. Han \(2022\)Graph\-flashback network for next location recommendation\.InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining,pp\. 1463–1471\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- X\. Rao, R\. Jiang, S\. Shang, L\. Chen, P\. Han, B\. Yao, and P\. Kalnis \(2024\)Next point\-of\-interest recommendation with adaptive graph contrastive learning\.TKDE37\(3\),pp\. 1366–1379\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- S\. Rendle, C\. Freudenthaler, and L\. Schmidt\-Thieme \(2010\)Factorizing personalized markov chains for next\-basket recommendation\.InProceedings of the 19th international conference on World wide web,pp\. 811–820\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- P\. Sánchez and A\. Bellogín \(2022\)Point\-of\-interest recommender systems based on location\-based social networks: a survey from an experimental perspective\.ACM Comput\. Surv\.54\(11s\)\.External Links:ISSN 0360\-0300,[Link](https://doi.org/10.1145/3510409),[Document](https://dx.doi.org/10.1145/3510409)Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- F\. Sun, J\. Liu, J\. Wu, C\. Pei, X\. Lin, W\. Ou, and P\. Jiang \(2019\)BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer\.InCIKM,pp\. 1441–1450\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- T\. Sun, M\. Chen, B\. Zhang, G\. Dai, W\. Huang, and K\. Zhao \(2025\)SILO: semantic integration for location prediction with large language models\.InSIGKDD,pp\. 2756–2767\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p2.1)\.
- T\. Sun, K\. Fu, W\. Huang, K\. Zhao, Y\. Gong, and M\. Chen \(2024\)Going where, by whom, and at what time: next location prediction considering user preference and temporal regularity\.InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 2784–2793\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- J\. Tan, S\. Xu, W\. Hua, Y\. Ge, Z\. Li, and Y\. Zhang \(2024\)Idgenrec: llm\-recsys alignment with textual id learning\.InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval,pp\. 355–364\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p2.1)\.
- D\. Wang, Y\. Huang, S\. Gao, Y\. Wang, C\. Huang, and S\. Shang \(2025\)Generative next poi recommendation with semantic id\.InSIGKDD,pp\. 2904–2914\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p2.1),[§4\.3](https://arxiv.org/html/2606.08122#S4.SS3.p2.4)\.
- T\. Wang and C\. Wang \(2024\)Embracing llms for point\-of\-interest recommendations\.IEEE Intelligent Systems39\(1\),pp\. 56–59\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- X\. Wang, M\. Fang, Z\. Zeng, and T\. Cheng \(2023\)Where would i go next? large language models as human mobility predictors\.arXiv preprint arXiv:2308\.15197\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p2.1),[§4\.3](https://arxiv.org/html/2606.08122#S4.SS3.p2.4)\.
- Y\. Wu, Y\. Peng, J\. Yu, and R\. Lee \(2025a\)Mas4poi: a multi\-agents collaboration system for next poi recommendation\.InPacific\-Asia Conference on Knowledge Discovery and Data Mining,pp\. 356–367\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p2.1)\.
- Y\. Wu, Y\. Peng, J\. Yu, X\. Liu, Z\. Yan, K\. Lin, W\. Su, B\. Qu, R\. Lee, and D\. Yang \(2025b\)Beyond regularity: modeling chaotic mobility patterns for next location prediction\.arXiv preprint arXiv:2509\.11713\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- Y\. Wu, K\. Li, G\. Zhao, and X\. Qian \(2020\)Personalized long\-and short\-term preference learning for next poi recommendation\.IEEE Transactions on Knowledge and Data Engineering34\(4\),pp\. 1944–1957\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- H\. Xue, F\. Salim, Y\. Ren, and N\. Oliver \(2021\)MobTCast: leveraging auxiliary trajectory forecasting for human mobility prediction\.NeurIPS34,pp\. 30380–30391\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- X\. Yan, T\. Song, Y\. Jiao, J\. He, J\. Wang, R\. Li, and W\. Chu \(2023\)Spatio\-temporal hypergraph learning for next poi recommendation\.InSIGIR,pp\. 403–412\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- D\. Yang, B\. Fankhauser, P\. Rosso, and P\. Cudre\-Mauroux \(2020\)Location prediction over sparse user mobility traces using rnns\.InIJCAI,pp\. 2184–2190\.Cited by:[§2](https://arxiv.org/html/2606.08122#S2.p1.1)\.
- D\. Yang, D\. Zhang, V\. W\. Zheng, and Z\. Yu \(2014\)Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns\.IEEE Transactions on Systems, Man, and Cybernetics: Systems45\(1\),pp\. 129–142\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p1.1)\.
- S\. Yang, J\. Liu, and K\. Zhao \(2022a\)GETNext: trajectory flow map enhanced transformer for next poi recommendation\.InSIGIR,pp\. 1144–1153\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p1.1),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p3.5)\.
- S\. Yang, J\. Liu, and K\. Zhao \(2022b\)Getnext: trajectory flow map enhanced transformer for next poi recommendation\.InProceedings of the 45th International ACM SIGIR Conference on research and development in information retrieval,pp\. 1144–1153\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- Y\. Yang, S\. Zhou, H\. Weng, D\. Wang, X\. Zhang, D\. Yu, and S\. Deng \(2024a\)Siamese learning based on graph differential equation for next\-poi recommendation\.Applied Soft Computing150,pp\. 111086\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1)\.
- Y\. Yang, S\. Zhou, H\. Weng, D\. Wang, X\. Zhang, D\. Yu, and S\. Deng \(2024b\)Siamese learning based on graph differential equation for next\-poi recommendation\.Applied Soft Computing150,pp\. 111086\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- F\. Yin, Y\. Liu, Z\. Shen, L\. Chen, S\. Shang, and P\. Han \(2023\)Next poi recommendation with dynamic graph and explicit dependency\.InProceedings of the AAAI conference on artificial intelligence,Vol\.37,pp\. 4827–4834\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- Q\. Yuan, G\. Cong, Z\. Ma, A\. Sun, and N\. Magnenat\-Thalmann \(2013\)Time\-aware point\-of\-interest recommendation\.InSIGIR,pp\. 363–372\.Cited by:[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p1.1)\.
- J\. Zeng, H\. Tao, H\. Tang, J\. Wen, and M\. Gao \(2025\)Global and local hypergraph learning method with semantic enhancement for poi recommendation\.Information Processing & Management62\(1\),pp\. 103868\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- P\. Zhao, A\. Luo, Y\. Liu, J\. Xu, Z\. Li, F\. Zhuang, V\. S\. Sheng, and X\. Zhou \(2020\)Where to go next: a spatio\-temporal gated network for next poi recommendation\.IEEE Transactions on Knowledge and Data Engineering34\(5\),pp\. 2512–2524\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p1.1)\.
- L\. Zhong, L\. Wang, X\. Yang, and Q\. Liao \(2025\)Comapoi: a collaborative multi\-agent framework for next poi prediction bridging the gap between trajectory and language\.InSIGIR,pp\. 1768–1778\.Cited by:[§1](https://arxiv.org/html/2606.08122#S1.p2.1),[§2](https://arxiv.org/html/2606.08122#S2.p2.1),[§4\.3](https://arxiv.org/html/2606.08122#S4.SS3.p2.4),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p1.1),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p2.1),[§5\.1](https://arxiv.org/html/2606.08122#S5.SS1.p3.5)\.

## Appendix AOverall Process ofIntentPOI

The overall process of IntentPOI is presented in Algorithm[1](https://arxiv.org/html/2606.08122#algorithm1), which consists of yielding user profiles and similarity matrix from historical trajectories \(highlighted inblue\) and inference phase for the query trajectory\.

For each userunu\_\{n\}, we generate the user profilePnFP\_\{n\}^\{\\rm F\}\(Lines 2–3\) and the pairwise user similarity matrixS=\{sm,n\}S=\\\{s\_\{m,n\}\\\}by fusing semantic and geographic similarities \(Lines 4–6\)\. The results are stored and reused across all queries\. For each query trajectory, we construct the peer behaviorPnSP\_\{n\}^\{\\rm S\}by selecting the top\-kksimilar users and retrieve their check\-in records within the temporal windowtnq±τt^\{q\}\_\{n\}\\pm\\tau\(Lines 7–8\)\. The intentionPnIP\_\{n\}^\{\\rm I\}is then generated byℳI\\mathcal\{M\}\_\{\\rm I\}\(Line 9\)\. For candidate selection, we construct a poolℋn\\mathcal\{H\}\_\{n\}from historical and spatial perspectives and apply hybrid filtering \(Lines 11–14\)\. Finally, the reasoning LLMℳR\\mathcal\{M\}\_\{\\rm R\}performs intention\-grounded reasoning to produce the top\-TTrecommendations𝒴^n\\hat\{\\mathcal\{Y\}\}\_\{n\}\(Line 15\)\.

## Appendix BDetailed Prompts

We provide the detailed prompts ofℳF\\mathcal\{M\}\_\{\\rm F\}andℳI\\mathcal\{M\}\_\{\\rm I\}in the right two figures\.

Input:

\{Xn\}n=1N\\\{X\_\{n\}\\\}\_\{n=1\}^\{N\}: historical trajectories of

NNusers;

Query:

\(un,tnq,Xnq\)\(u\_\{n\},t^\{q\}\_\{n\},X^\{q\}\_\{n\}\);

Pretrained LLMs:

ℳF,ℳI,ℳR\\mathcal\{M\}\_\{\\rm F\},\\mathcal\{M\}\_\{\\rm I\},\\mathcal\{M\}\_\{\\rm R\};

Hyperparameters:

k,α,τ,Z,B,ρ,Tk,\\alpha,\\tau,Z,B,\\rho,T\.

Output:Recommended POIs

𝒴^n\\hat\{\\mathcal\{Y\}\}\_\{n\}
1

2for*n∈\[1,N\]n\\in\[1,N\]*do

3Xns←X\_\{n\}^\{\\rm s\}\\leftarrowstatistical summary ofXnX\_\{n\};

PnF←ℳF​\(Xn,Xns\)P\_\{n\}^\{\\rm F\}\\leftarrow\\mathcal\{M\}\_\{\\rm F\}\(X\_\{n\},X\_\{n\}^\{\\rm s\}\);

//user profile

4

𝒢n=\{Geohash​\(xn,i​\(l​a​t\),xn,i​\(l​o​n\)\)∣∀xn,i∈Xn\}\\mathcal\{G\}\_\{n\}=\\\{\\text\{Geohash\}\(x\_\{n,i\}\(lat\),x\_\{n,i\}\(lon\)\)\\mid\\forall x\_\{n,i\}\\in X\_\{n\}\\\};

5En←Φ​\(PnF\)E\_\{n\}\\leftarrow\\Phi\(P\_\{n\}^\{\\rm F\}\);

6S←\{sm,n\}N×NS\\leftarrow\\\{s\_\{m,n\}\\\}\_\{N\\times N\}, withsm,ns\_\{m,n\}computed via Eq\. \([4](https://arxiv.org/html/2606.08122#S4.E4)\), \([6](https://arxiv.org/html/2606.08122#S4.E6)\), \([7](https://arxiv.org/html/2606.08122#S4.E7)\)

7

𝒰n=\{um∣sm,n∈arg​top−k1≤m≤N,m≠n⁡sm,n\}\\mathcal\{U\}\_\{n\}=\\\{u\_\{m\}\\mid s\_\{m,n\}\\in\\operatorname\{arg\\,top\-k\}\_\{1\\leq m\\leq N,m\\neq n\}s\_\{m,n\}\\\};

PnS←P\_\{n\}^\{\\rm S\}\\leftarrowcheck\-ins of

um∈𝒰nu\_\{m\}\\in\\mathcal\{U\}\_\{n\}in

tnq±τt^\{q\}\_\{n\}\\pm\\tau;

//peer behaviors

PnI←ℳI​\(Xnq,PnF,PnS\)P\_\{n\}^\{\\rm I\}\\leftarrow\\mathcal\{M\}\_\{\\rm I\}\(X^\{q\}\_\{n\},\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}P\_\{n\}^\{\\rm F\}\},P\_\{n\}^\{\\rm S\}\);

//intention

8

ℋnh←\{p∣\(n,t,p\)∈\\mathcal\{H\}\_\{n\}^\{\\rm h\}\\leftarrow\\\{p\\mid\(n,t,p\)\\inXnX\_\{n\}\}\\\};

9

ℋns←Z\\mathcal\{H\}\_\{n\}^\{\\rm s\}\\leftarrow Znearest POIs to last location in

XnqX^\{q\}\_\{n\};

ℋn←ℋnh∪ℋns\\mathcal\{H\}\_\{n\}\\leftarrow\\mathcal\{H\}\_\{n\}^\{\\rm h\}\\cup\\mathcal\{H\}\_\{n\}^\{\\rm s\};

//candidate pool

ℋ¯n←\\bar\{\\mathcal\{H\}\}\_\{n\}\\leftarrowfilter

ℋn\\mathcal\{H\}\_\{n\};

//hybrid filtering

10

𝒴^n←ℳR\(Xnq,\\hat\{\\mathcal\{Y\}\}\_\{n\}\\leftarrow\\mathcal\{M\}\_\{\\rm R\}\(X\_\{n\}^\{q\},PnFP\_\{n\}^\{\\rm F\},PnI,ℋ¯n\),P\_\{n\}^\{\\rm I\},\\bar\{\\mathcal\{H\}\}\_\{n\}\);

//intention\-grounded reasoning

return

𝒴^n\\hat\{\\mathcal\{Y\}\}\_\{n\}

Algorithm 1IntentPOI: Training and Inference1\. Prompt inℳF\\mathcal\{M\}\_\{\\rm F\}for user profilesSystem Prompt:You are a mobility analyst\. Based on the following data about a Foursquare user inNew York City, write a comprehensive user profile \(2\-3 paragraphs, approximately 200 words\)\.User Prompt:\#\# Statistical SummaryThe user with ID 1 has taken 6 trips in total\.•The top hours and frequencies for this user are: 18:00\-18:59 for 4 times; 16:00\-16:59 for 3 times…•The top locations and frequencies are: ‘P\.J\. Clarke’s’ for 2 times; ‘La Colombe Torrefaction’ for 2 times…•The top categories and frequencies are: ‘bar’ for 2 times; ‘gastropub’ for 2 times…\#\# Visit History•2012\-04\-08 16:02 \| Hi\-Life Bar & Grill \| bar•2012\-04\-09 12:20 \| Bubby’s \| popular American restaurant•⋯\\cdotsWrite in third person, present tense\. Be specific — reference actual venue names and neighborhoods where possible\. Do NOT include bullet points or headers; write flowing paragraphs\.2\. Prompt inℳI\\mathcal\{M\}\_\{\\rm I\}for intention inferenceSystem Prompt:You predict where aNew York Cityuser is likely to be at a known future time\. Return only JSON\.User Prompt:\#\# User ProfileThis user presents as an evening\-oriented urban explorer whose activity concentrates after work and into the night …\#\# Temporal Contexts•17:12 on 2012\-07\-03: BLT Fish Shack \(American restaurant\)•18:12 on 2012\-07\-03: Rye House \(Whisky Bar\)•15:47 on 2012\-07\-04: AMC Loews Lincoln Square 13 \(cinema\) ← LAST KNOWN LOCATIONThe next visit to predict happens on 2012\-07\-04 \(Wednesday\) at 21:00 during the night\. Use this as a prior, but do not assume the destination\.\#\# Similar Users Did Around The Target Time \(±\\pm30 min\)•\- Similar user \(similarity 0\.86\): around 21:00, was at Medi Winebar, then visited Thalia\.•\- Similar user \(similarity 0\.84\): around 21:00, was at Ditch Plains, then visited Westside Market\.•⋯\\cdotsBased on this trajectory and the known target time, infer where the user is most likely to be at that moment\. Return JSON with exactly these keys: “summary”, “likely\_categories”, and “rationale”\.

Similar Articles

ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning

arXiv cs.CL

ATTNPO introduces an attention-guided process supervision framework that reduces overthinking in large reasoning models by leveraging intrinsic attention signals for step-level credit assignment, achieving improved performance with shorter reasoning lengths across 9 benchmarks.

Hint-Guided Diversified Policy Optimization for LLM Reasoning

arXiv cs.CL

This paper introduces Hint-Guided Diversified Policy Optimization (HDPO), a two-stage RL framework that encourages LLMs to first generate multiple candidate solution outlines (hints) and then select the most reliable one for detailed reasoning, improving reasoning diversity and reliability.

AIPO: : Learning to Reason from Active Interaction

arXiv cs.CL

This paper introduces AIPO, a reinforcement learning framework that enhances LLM reasoning by allowing the model to actively consult collaborative agents during exploration to overcome capability boundaries.