A Comparative Study of Graph Neural Network Layer Selection for Interaction Modelling in Driving Trajectory Prediction

arXiv cs.LG Papers

Summary

This paper compares 19 graph neural network layer types for modelling interactions in driving trajectory prediction, finding ARMA, Chebyshev, and topology-aware layers most effective and offering design principles for better prediction models.

arXiv:2606.14956v1 Announce Type: new Abstract: Autonomous driving systems rely on precise trajectory prediction to plan safe and efficient movement. Graph Neural Networks (GNNs) have become a promising approach for modelling spatiotemporal interactions among road agents. However, designing GNN architectures for trajectory prediction remains non-standardized, with little guidance on which graph layers effectively capture spatial interactions and temporal dynamics. This paper offers a detailed comparative study of 19 graph layer types, focusing on their spatial and temporal processing capabilities to discover the most effective architectures for trajectory prediction. Within the explored hyperparameter setting, we highlight five standout layer combinations, with ARMA, Chebyshev, and topology-aware layers consistently performing better than others. Beyond performance metrics, our findings yield practical design principles: sum-based aggregation is more effective than mean-based methods, multi-head attention mechanisms enable richer interactions, and assigning different weights to different hop distances significantly improves prediction accuracy. These findings offer useful guidance for designing more interpretable and effective trajectory prediction models.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:36 AM

# A Comparative Study of Graph Neural Network Layer Selection for Interaction Modelling in Driving Trajectory Prediction
Source: [https://arxiv.org/html/2606.14956](https://arxiv.org/html/2606.14956)
George Daoud1,2, Mohamed El\-Darieby11Ontario Tech University, Oshawa, ON, CanadaGeorge\.Daoud@OntarioTechU\.ca, Mohamed\.El\-Darieby@OntarioTechU\.ca2Assiut University, Assiut, Egypt

###### Abstract

Autonomous driving systems rely on precise trajectory prediction to plan safe and efficient movement\. Graph Neural Networks \(GNNs\) have become a promising approach for modelling spatiotemporal interactions among road agents\. However, designing GNN architectures for trajectory prediction remains non\-standardized, with little guidance on which graph layers effectively capture spatial interactions and temporal dynamics\. This paper offers a detailed comparative study of 19 graph layer types, focusing on their spatial and temporal processing capabilities to discover the most effective architectures for trajectory prediction\. Within the explored hyperparameter setting, we highlight five standout layer combinations, with ARMA, Chebyshev, and topology\-aware layers consistently performing better than others\. Beyond performance metrics, our findings yield practical design principles: sum\-based aggregation is more effective than mean\-based methods, multi\-head attention mechanisms enable richer interactions, and assigning different weights to different hop distances significantly improves prediction accuracy\. These findings offer useful guidance for designing more interpretable and effective trajectory prediction models\.

## IIntroduction

Predicting the future paths of road agents, such as vehicles, pedestrians, and cyclists, is now a crucial part of autonomous driving systems\. This process works between the perception and planning modules\. By modelling interactions among nearby agents and the ego vehicle, trajectory prediction allows for safe and efficient short\-term planning\[[13](https://arxiv.org/html/2606.14956#bib.bib13)\]\. This prediction is beneficial not only for autonomous driving but also for traffic safety assessment and adaptive control in Intelligent Transportation Systems \(ITS\), especially in complex environments such as highways and roundabouts\[[40](https://arxiv.org/html/2606.14956#bib.bib14)\]\.

Current trajectory prediction methods can be grouped into physics\-based and machine\-learning\-based approaches\. Physics\-based methods use physical and probabilistic models, while learning\-based methods derive motion patterns directly from data\. Although learning\-based approaches usually offer higher accuracy, they often struggle to capture the semi\-structured, dynamic nature of driving scenes and the complex interactions between agents, which involve changes over time, varying road geometries, and varying numbers of agents\[[6](https://arxiv.org/html/2606.14956#bib.bib15)\]\.

A common solution is to represent driving scenes as sequences of semantic bird’s\-eye\-view \(BEV\) images, where interactions are reflected in pixel locations and values\. While BEV representations provide fixed\-size inputs, they need discretization, increase input dimensionality, and come with higher computational costs\[[20](https://arxiv.org/html/2606.14956#bib.bib16)\]\.

Graph\-based representations provide a more organized alternative by modelling agents as nodes and their interactions as edges, either across individual time steps or within a single spatiotemporal graph\. Road network information can be incorporated through node features\[[5](https://arxiv.org/html/2606.14956#bib.bib17)\]or by explicitly modelling roads and lanes as graphs\[[11](https://arxiv.org/html/2606.14956#bib.bib12)\]\.

Graph\-based trajectory prediction models utilize graph neural networks \(GNNs\) to capture spatial and temporal interactions while keeping input dimensionality low and consistent\. Despite promising results, the effectiveness of different GNN layer types for modelling interactions remains unclear\. To fill this gap, this paper presents a detailed comparative study of graph neural layers for spatiotemporal trajectory prediction and expands on the architecture of Daoud et al\.\[[5](https://arxiv.org/html/2606.14956#bib.bib17)\]to enable longer prediction horizons with shorter observation windows\.

This work makes three contributions: \(1\) an evaluation of 19 graph convolutional layer types for spatiotemporal trajectory prediction, filling a critical gap in GNN architecture design; \(2\) identification of five superior layer combinations that outperform prior work on roundabout scenarios; and \(3\) design principles that practitioners can apply to GNN\-based trajectory prediction systems, such as the superiority of sum\-based aggregation and the importance of hop\-specific weight matrices\.

## IIRelated Works

Existing trajectory prediction methods can be divided into physics\-based and machine\-learning\-based approaches\. Physics\-based models apply physical laws and probabilistic techniques to understand interactions and predict future movement\. For instance, Kalman Filters have been used with kinematic models like constant turn rate and acceleration \(CTRA\) to address uncertainty\[[34](https://arxiv.org/html/2606.14956#bib.bib19)\]\. Additionally, comfort constraints can be added to cost functions to create smooth trajectories\[[28](https://arxiv.org/html/2606.14956#bib.bib22)\]\.

Machine\-learning approaches learn driving patterns from data\. They can be further categorized by how they represent input into bird’s\-eye\-view \(BEV\) and graph\-based models\. BEV\-based methods rasterize scenes into semantic images and employ deep neural networks such as CNNs\[[21](https://arxiv.org/html/2606.14956#bib.bib25)\], VAEs\[[35](https://arxiv.org/html/2606.14956#bib.bib26)\]and conditional VAEs\[[41](https://arxiv.org/html/2606.14956#bib.bib27)\], or LSTM\-based encoder\-decoder models with attention\[[22](https://arxiv.org/html/2606.14956#bib.bib29)\]or social pooling\[[23](https://arxiv.org/html/2606.14956#bib.bib28)\]to predict future trajectories, mainly for the ego vehicle\. Attention mechanisms are frequently used to understand interactions between cars and road elements\[[39](https://arxiv.org/html/2606.14956#bib.bib60)\]\.

Graph\-based models depict driving scenes as graphs\. They can be structured as sequences of spatial graphs with temporal propagation, as a single spatiotemporal graph, or as heterogeneous graphs that feature different types of spatial and temporal edges\. Sequence\-based methods handle spatial interactions using graph convolutional layers \(GCLs\) and capture temporal dynamics with sequence\-to\-sequence architectures like Transformers\[[37](https://arxiv.org/html/2606.14956#bib.bib24)\]or GRUs\[[38](https://arxiv.org/html/2606.14956#bib.bib30)\]\. Single\-graph formulations embed temporal data into node features and typically use GAT layers, which have been shown to perform better than GCNs\[[8](https://arxiv.org/html/2606.14956#bib.bib31)\]\. Heterogeneous graph models clearly differentiate spatial and temporal interactions by using different GCLs, such as GAT\[veličković2018graphattentionnetworks\]for spatial relationships and GCN\[[16](https://arxiv.org/html/2606.14956#bib.bib32)\]for temporal dynamics\[[5](https://arxiv.org/html/2606.14956#bib.bib17)\]\.

Hybrid methods mix physics\-based and learning\-based models\. For example, they may integrate shock\-wave physics\[[36](https://arxiv.org/html/2606.14956#bib.bib18)\], combine kinematic models with learned predictors\[[15](https://arxiv.org/html/2606.14956#bib.bib20)\], or merge physical model outcomes with recurrent networks\[[18](https://arxiv.org/html/2606.14956#bib.bib21)\]\. Additional interaction modelling techniques, such as game\-theoretic approaches, inverse reinforcement learning, and drift\-diffusion models, are reviewed by Wang et al\.\[[32](https://arxiv.org/html/2606.14956#bib.bib56)\]\.

This paper focuses on heterogeneous graph\-based models, which blend the precision of learning\-based methods with the effectiveness of graph representations\. Building on Daoud et al\.\[[5](https://arxiv.org/html/2606.14956#bib.bib17)\], we present a modified architecture and conduct a comparative study to identify the most effective graph convolutional layers for predicting vehicle trajectories\.

## IIIArchitecture Design and Layer Selection

The proposed architecture builds on the architecture proposed by Daoud et al\.\[[5](https://arxiv.org/html/2606.14956#bib.bib17)\], aiming to improve prediction accuracy and extend the prediction horizon\. We also evaluate 19 graph convolutional layers \(GCLs\) to find the most effective configurations\.

Unlike the original approach, which rotates the map to match the target vehicle’s heading, we keep the map centered without rotation\. Since map segments are already globally aligned, this simplification reduces computation while maintaining route feasibility\. As a result, the model uses a shorter 1\-second observation window and a longer 5\-second prediction horizon\.

Figure[1](https://arxiv.org/html/2606.14956#S3.F1)shows the overall architecture\. Map data is processed independently using a ResNet\-18 to generate a compact embedding\. This embedding is then combined with numerical agent features to create the initial node representation\. Node embeddings are updated overhhiterations, which corresponds to the number of historical frames, using paired spatial and temporal GCLs to model interactions\. Skip connections are included to reduce over\-smoothing and improve expressiveness\. A final MLP produces the predicted trajectory\. The map embedding is 200\-dimensional, while spatial and temporal embeddings are 100\-dimensional\. The driving scenes are sampled at 5 Hz, using five historical frames to predict 25 future steps\. Four GCL layers are used, and the MLP outputs a 50\-dimensional vector per node\. When specific GCLs need it, an optional fully connected layer projects inputs to a compatible dimension \(100\)\.

![Refer to caption](https://arxiv.org/html/2606.14956v1/arch2.jpg)Figure 1:The general view of the proposed architectureTo choose suitable GCLs, we assess 18 variants across spatial \(G​C​Ls\{GCL\}\_\{s\}\) and temporal \(G​C​Lt\{GCL\}\_\{t\}\) components\. Starting from a GCN\[[16](https://arxiv.org/html/2606.14956#bib.bib32)\]baseline, layers are replaced one at a time and in combination\. Table[I](https://arxiv.org/html/2606.14956#S3.T1)summarizes the evaluated GCLs and their parameters\. The GCLs are divided into six categories:

TABLE I:The list of graph convolutional layers and their parametersGraph convolutional layerParameters∗\\astGraph Convolutional Network \(GCN\)\[[16](https://arxiv.org/html/2606.14956#bib.bib32)\]SAGE\[[12](https://arxiv.org/html/2606.14956#bib.bib35)\]Higher\-order Graph Networks \(HoGraph\)\[[24](https://arxiv.org/html/2606.14956#bib.bib40)\]Graph attentional layer \(AGNN\)\[[31](https://arxiv.org/html/2606.14956#bib.bib42)\]needs an extra FCFrequency Adaptive Convolution \(FA\)\[[3](https://arxiv.org/html/2606.14956#bib.bib49)\]ϵ=0\.1\\epsilon=0\.1Graph Attention Network \(GAT\)\[veličković2018graphattentionnetworks\]Local Extremum Network \(LEConv\)\[[26](https://arxiv.org/html/2606.14956#bib.bib48)\]Efficient Graph Convolution \(EGC\)\[[30](https://arxiv.org/html/2606.14956#bib.bib50)\]H=4H=4,B=4B=4Transformer\[[27](https://arxiv.org/html/2606.14956#bib.bib36)\]H=4H=4SuperGAT\[[14](https://arxiv.org/html/2606.14956#bib.bib34)\]H=4H=4Simplifying Graph Convolutional \(SGC\)\[[33](https://arxiv.org/html/2606.14956#bib.bib39)\]K=3Simple Spectral Convolutional \(S2GC\)\[[42](https://arxiv.org/html/2606.14956#bib.bib44)\]α=0\.5\\alpha=0\.5,K=3K=3MixHop\[[1](https://arxiv.org/html/2606.14956#bib.bib47)\]K=3Topology adaptive convolutional \(TAGCN\)\[[9](https://arxiv.org/html/2606.14956#bib.bib37)\]K=3Molecular Fingerprints \(MF\)\[[10](https://arxiv.org/html/2606.14956#bib.bib46)\]Gated Graph Convolution \(GRU\)\[[19](https://arxiv.org/html/2606.14956#bib.bib43)\]needs an extra FCResidual Gated Convolutional \(ResGRU\)\[[4](https://arxiv.org/html/2606.14956#bib.bib45)\]ARMA\[[2](https://arxiv.org/html/2606.14956#bib.bib38)\]TL=1,Ks=1T\_\{L\}=1,K\_\{s\}=1Chebyshev Spectral Graph Convolutional\[[7](https://arxiv.org/html/2606.14956#bib.bib41)\]kc=3k\_\{c\}=3
- •∗\\astHH: the number of heads,KK: the number of hops,KsK\_\{s\}andTLT\_\{L\}: the number of stacks and layers for ARMA filter,KcK\_\{c\}: Chebyshev filter length,BB: the number of bases

1. 1\.Traditional Graph Convolutions, that include GCN, GraphSAGE\[[12](https://arxiv.org/html/2606.14956#bib.bib35)\], and Higher\-order graph \(HoGraph\)\[[24](https://arxiv.org/html/2606.14956#bib.bib40)\]\. These methods aggregate neighbour information by summation or averaging\. They use self\-loops to combine features from nodes and their neighbours\.
2. 2\.Single\-Head Attention\-Based layers, which assign attention weights to neighbours based on how relevant they are\. Examples include AGNN\[[31](https://arxiv.org/html/2606.14956#bib.bib42)\], Frequency Adaptation Graph \(FA\)\[[3](https://arxiv.org/html/2606.14956#bib.bib49)\], GAT\[veličković2018graphattentionnetworks\], and LEConv\[[26](https://arxiv.org/html/2606.14956#bib.bib48)\]\. They use cosine similarity or learned projections\.
3. 3\.Multi\-Head Attention\-Based layers, that extend attention mechanisms with multiple heads to capture different interactions\. This group includes EGC\[[30](https://arxiv.org/html/2606.14956#bib.bib50)\], Transformer\-based layers\[[27](https://arxiv.org/html/2606.14956#bib.bib36)\], and SuperGAT\[[14](https://arxiv.org/html/2606.14956#bib.bib34)\]\. All of them use four attention heads\.
4. 4\.Topology\-Based layers, that take advantage of the multi\-hop structure of graphs\. For example, SGC\[[33](https://arxiv.org/html/2606.14956#bib.bib39)\]and S2GC\[[42](https://arxiv.org/html/2606.14956#bib.bib44)\]share weights across hops\. MixHop\[[1](https://arxiv.org/html/2606.14956#bib.bib47)\]and TAGCN\[[9](https://arxiv.org/html/2606.14956#bib.bib37)\]use hop\-specific weights\. Molecular Fingerprints \(MF\)\[[10](https://arxiv.org/html/2606.14956#bib.bib46)\]adjusts the weights based on the node’s degree\.
5. 5\.Recurrent\-Based layers, which use recurrent units like GRUs\. They update node embeddings by treating neighbour information as sequential input\.
6. 6\.Specialized Graph filters, that include ARMA\[[2](https://arxiv.org/html/2606.14956#bib.bib38)\]and Chebyshev\[[7](https://arxiv.org/html/2606.14956#bib.bib41)\]convolutions\. They use spectral filtering to capture long\-range dependencies\.

## IVExperiments

The proposed model is evaluated using the RounD dataset\[[17](https://arxiv.org/html/2606.14956#bib.bib53)\], which was collected in Germany utilizing a drone hovering over multiple roundabouts\. It contains detailed trajectory recordings of various road users, including bicycles, motorcycles, cars, trailers, trucks, vans, and buses\.

In this study, experiments are conducted on the third scenario of the dataset, which captures traffic activity at a roundabout\. Graph construction follows three rules: \(1\) spatial edges connect agents in the same frame within 30m Euclidean distance; \(2\) temporal edges link consecutive agent instances across frames; \(3\) nodes lacking a complete 5\-frame history or 25\-frame prediction horizon are masked from the prediction\. A summary of the resulting graph structure is provided in Table[II](https://arxiv.org/html/2606.14956#S4.T2)\.

TABLE II:Overview of graph structure and preprocessing rulesDuring training, the model parameters are optimized using the Adam optimizer\. Training is performed for 60 epochs, with the learning rate reduced by a factor of 10 after 30 epochs and again after an additional 20 epochs\. The initial learning rate is set to10−310^\{\-3\}for all models except EGC, which uses10−410^\{\-4\}\. Mean Squared Error \(MSE\) is used as the training loss function\. All experiments are conducted on a workstation equipped with 64 GB of RAM and an NVIDIA GeForce RTX 4090 GPU\. On average, training a single model requires approximately 10 hours\.

At the end of each epoch, the model is evaluated on the validation set using the Average Displacement Error \(ADE\), defined in Equation \([1](https://arxiv.org/html/2606.14956#S4.E1)\)\. The best\-performing model is saved and restored before each learning rate reduction to ensure optimal convergence\. After training is complete, a final evaluation is performed on the test set using both the ADE and the Final Displacement Error \(FDE\), defined in Equation \([2](https://arxiv.org/html/2606.14956#S4.E2)\)\.

A​D​E​\(Δ​Tf\)=1Δ​Tf​∑Δ​tf=1Δ​tf=Δ​TfF​D​E​\(Δ​tf\)ADE\(\\Delta T\_\{f\}\)=\\frac\{1\}\{\\Delta T\_\{f\}\}\\sum\_\{\\Delta t\_\{f\}=1\}^\{\\Delta t\_\{f\}=\\Delta T\_\{f\}\}\{FDE\(\\Delta t\_\{f\}\)\}\(1\)
F​D​E​\(Δ​tf\)=1\|𝒩\|​∑n∈𝒩‖\[Δ​x^n​\(Δ​tf\)−Δ​xn​\(Δ​tf\)Δ​y^n​\(Δ​tf\)−Δ​yn​\(Δ​tf\)\]‖2FDE\(\\Delta t\_\{f\}\)=\\frac\{1\}\{\|\\mathcal\{N\}\|\}\\sum\_\{n\\in\\mathcal\{N\}\}\{\\left\\\|\\begin\{bmatrix\}\\Delta\\hat\{x\}\_\{n\}\(\\Delta t\_\{f\}\)\-\\Delta x\_\{n\}\(\\Delta t\_\{f\}\)\\\\ \\Delta\\hat\{y\}\_\{n\}\(\\Delta t\_\{f\}\)\-\\Delta y\_\{n\}\(\\Delta t\_\{f\}\)\\\\ \\end\{bmatrix\}\\right\\\|\_\{2\}\}\(2\)
Here,Δ​tf\\Delta t\_\{f\}denotes the prediction time step, andΔ​Tf\\Delta T\_\{f\}represents the full prediction horizon\. The set𝒩\\mathcal\{N\}includes all nodes with complete history and future\.Δ​xn\\Delta x\_\{n\}andΔ​yn\\Delta y\_\{n\}denote the ground\-truth future displacement of nodenn, whilex^n\\hat\{x\}\_\{n\}andy^n\\hat\{y\}\_\{n\}represent the corresponding predicted displacements\.

## VResults and Discussion

For consistency with prior work, Graph Convolutional Networks \(GCNs\) were used as the baseline for both spatial and temporal layers \(GCLs\\text\{GCL\}\_\{s\}andGCLt\\text\{GCL\}\_\{t\}\)\. Following standard ablation practices, each layer was first replaced independently and then jointly with alternative graph layers\. Performance was evaluated using Average Displacement Error \(ADE\) and Final Displacement Error \(FDE\) over prediction horizons at 3 and 5 seconds\. For comparison, we also report results from three studies using the same dataset: a BEV\-based CNN model\[[25](https://arxiv.org/html/2606.14956#bib.bib51)\]\(as reported by\[[29](https://arxiv.org/html/2606.14956#bib.bib52)\]\), a deep GNN approach\[[6](https://arxiv.org/html/2606.14956#bib.bib15)\], and a hybrid GNN–CNN–Transformer model\[[29](https://arxiv.org/html/2606.14956#bib.bib52)\]\. The latter reportsm​i​n​A​D​EminADEandm​i​n​F​D​EminFDEdue to its multimodal predictions, making comparison more challenging since our model produces a single trajectory\. Nevertheless, their results are included in Table[III](https://arxiv.org/html/2606.14956#S5.T3)\. For clarity, superior values in the following tables will be highlighted\.

TABLE III:Trajectory Prediction results from literature- •∗\\astuses them​i​n​A​D​EminADEandm​i​n​F​D​EminFDEinstead ofA​D​EADEandF​D​EFDE\.

Table[IV](https://arxiv.org/html/2606.14956#S5.T4)summarizes results for traditional GCLs\. Both GraphSAGE and HoGraph outperform the GCN baseline, with HoGraph achieving the best results across spatial and temporal layers\. In particular, it surpasses prior work in terms ofF​D​EFDEat 5 seconds\. These findings suggest that using separate weight matrices for nodes and their neighbours improves performance\. In addition, sum\-based aggregation consistently outperforms mean aggregation in this setting\.

TABLE IV:Trajectory Prediction results for traditional GCLsTable[V](https://arxiv.org/html/2606.14956#S5.T5)reports the performance of single\-head attention models\. AGNN and FA generally underperform the baseline, whereas GAT consistently improves prediction accuracy\. LEConv achieves the best performance in this group and outperforms prior work in terms ofF​D​EFDEat 5 seconds\. These results indicate that transforming node embeddings before computing attention \(as in GAT\) is more effective than using cosine similarity or raw features\. Avoiding projection altogether, as in LEConv, yields further gains\.

TABLE V:Trajectory Prediction results for single\-head attention\-based GCLsResults for multi\-head attention models are shown in Table[VI](https://arxiv.org/html/2606.14956#S5.T6)\. Transformer and SuperGAT significantly outperform the baseline, whereas EGC fails to improve performance due to its reliance on basis decomposition rather than head\-specific transformations\. SuperGAT achieves the strongest results in this category, thanks to its expressive attention formulation, which better captures inter\-agent interactions\.

TABLE VI:Trajectory Prediction results for multi\-head attention\-based GCLsTable[VII](https://arxiv.org/html/2606.14956#S5.T7)presents results for topology\-aware GCLs\. SGC and S2GC perform poorly due to shared weights across hops\. In contrast, MixHop and TAGCN perform better by assigning distinct weights to different hop distances\. TAGCN achieves the strongest performance, especially for long\-horizon prediction, by using summation rather than concatenation\. MF also performs competitively by adapting weights based on node degree, though it slightly underperforms TAGCN at 3 seconds\.

TABLE VII:Trajectory Prediction results for topology\-based GCLsTable[VIII](https://arxiv.org/html/2606.14956#S5.T8)summarizes the performance of GRU\-based models\. Both GRU and ResGRU outperform prior work when applied to the spatial layer\. ResGRU consistently achieves better results due to its combination of recurrent modelling and attention mechanisms\.

TABLE VIII:Trajectory Prediction results for GRU\-based GCLsResults for spectral and filter\-based layers are shown in Table[IX](https://arxiv.org/html/2606.14956#S5.T9)\. ARMA and Chebyshev convolutions achieve strong performance and outperform prior methods across most metrics\. When applied to both spatial and temporal layers, these models yield the best overall results, particularly for longer prediction horizons\.

TABLE IX:Trajectory Prediction results for special types of GCLsTable[X](https://arxiv.org/html/2606.14956#S5.T10)summarizes the best\-performing configurations\. The strongest results are obtained using topology\-aware and Specialized filter layers\. Although the proposed model is unimodal, it outperforms all unimodal baselines and even surpasses some multimodal methods\. Performance gains are not additive\. For example, replacing GCN with LEConv in the temporal layer improvesA​D​E​@​5​sADE@5sby 17\.4%, while replacing it in the spatial layer improves it by 29\.6%\. However, applying LEConv to both layers yields only a 23\.3% improvement, highlighting the interdependence of spatial and temporal modelling\. Overall, spatial modelling has a greater impact on performance, as improvements were observed in 13 of the 18 tested configurations when the spatial GCL was modified\.

TABLE X:Best graph layers for spatiotemporal trajectory prediction
## VIConclusion and Future Work

In this work, we present a graph\-based framework for modelling interactions among road agents and predicting road\-agent trajectories over a 5s horizon using only 1s of history\. The model is evaluated on a roundabout scenario, and an extensive analysis is conducted to study the effect of different graph convolutional layer designs\. By evaluating combinations of 18 graph layers, the following conclusions are drawn:

- •Five layer combinations achieve the best performance\.
- •Sum\-based aggregation is more effective than averaging for traditional GCLs\.
- •In single\-head attention models, applying a linear transformation before attention improves accuracy\.
- •Multi\-head attention benefits from using distinct transformations per head\.
- •Topology\-aware layers perform better when using hop\- or degree\-specific weights and sum\-based aggregation\.
- •GRU\-based layers require attention mechanisms to achieve competitive results\.
- •Spectral filters, particularly Chebyshev convolutions, consistently outperform other layer types\.
- •Overall, graph\-based models significantly outperform traditional machine learning approaches for trajectory prediction\.

One limitation of this work is that only 55 graph\-layer combinations were evaluated out of a possible18218^\{2\}\. While this ablation\-style selection highlights the most influential layers, it does not fully explore the design space\. Future studies can examine additional combinations, particularly those that performed well when applied exclusively to either the spatial or temporal components\. Another limitation is the use of a single driving scenario type\. Although roundabouts present complex interactions and rich traffic dynamics, they do not capture all driving conditions\. Evaluating the model on other scenarios is necessary to assess its generalization capability\.

This study also assumes that ADE and FDE sufficiently capture prediction quality and that unimodal predictions offer a reasonable trade\-off between accuracy and efficiency\. In addition, it assumes that performance trends generalize across different types of road users, although further validation is required\. Other metrics needed to be investigated as well\. Also, Computational cost, as model parameter counts and FLOP estimates, needs to be included in future work\.

Finally, a promising direction for future work is the development of hybrid graph layers that combine the most effective properties of the evaluated methods\. Integrating strengths from multiple graph convolutional strategies could yield more expressive and robust spatiotemporal models\.

## References

- \[1\]S\. Abu\-El\-Haija, B\. Perozzi, A\. Kapoor, N\. Alipourfard, K\. Lerman, H\. Harutyunyan, G\. V\. Steeg, and A\. Galstyan\(2019\)MixHop: higher\-order graph convolutional architectures via sparsified neighborhood mixing\.Note:ICML 2021External Links:1905\.00067,[Link](https://arxiv.org/abs/1905.00067)Cited by:[item 4](https://arxiv.org/html/2606.14956#S3.I2.i4.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.19.8.1)\.
- \[2\]\(2021\)Graph neural networks with convolutional arma filters\.IEEE Transactions on Pattern Analysis and Machine Intelligence,pp\. 1–1\.External Links:ISSN 1939\-3539,[Link](http://dx.doi.org/10.1109/TPAMI.2021.3054830),[Document](https://dx.doi.org/10.1109/tpami.2021.3054830)Cited by:[item 6](https://arxiv.org/html/2606.14956#S3.I2.i6.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.10.10.2)\.
- \[3\]D\. Bo, X\. Wang, C\. Shi, and H\. Shen\(2021\)Beyond low\-frequency information in graph convolutional networks\.Note:AAAI 2021External Links:2101\.00797,[Link](https://arxiv.org/abs/2101.00797)Cited by:[item 2](https://arxiv.org/html/2606.14956#S3.I2.i2.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.2.2.2)\.
- \[4\]X\. Bresson and T\. Laurent\(2018\)Residual gated graph convnets\.External Links:1711\.07553,[Link](https://arxiv.org/abs/1711.07553)Cited by:[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.23.12.1)\.
- \[5\]G\. Daoud, M\. El\-Darieby, and K\. Elgazzar\(2023\-08\)Prediction of autonomous vehicle trajectories in turnaround scenarios\.In2023 10th International Conference on Dependable Systems and Their Applications \(DSA\),pp\. 606–613\.External Links:[Link](http://dx.doi.org/10.1109/DSA59317.2023.00089),[Document](https://dx.doi.org/10.1109/dsa59317.2023.00089)Cited by:[§I](https://arxiv.org/html/2606.14956#S1.p4.1),[§I](https://arxiv.org/html/2606.14956#S1.p5.1),[§II](https://arxiv.org/html/2606.14956#S2.p3.1),[§II](https://arxiv.org/html/2606.14956#S2.p5.1),[§III](https://arxiv.org/html/2606.14956#S3.p1.1)\.
- \[6\]G\. Daoud and M\. El\-Darieby\(2023\-08\)Towards a benchmark for trajectory prediction of autonomous vehicles\.In2023 10th International Conference on Dependable Systems and Their Applications \(DSA\),pp\. 614–622\.External Links:[Link](http://dx.doi.org/10.1109/DSA59317.2023.00090),[Document](https://dx.doi.org/10.1109/dsa59317.2023.00090)Cited by:[§I](https://arxiv.org/html/2606.14956#S1.p2.1),[TABLE III](https://arxiv.org/html/2606.14956#S5.T3.3.6.2.1),[§V](https://arxiv.org/html/2606.14956#S5.p1.4)\.
- \[7\]M\. Defferrard, X\. Bresson, and P\. Vandergheynst\(2017\)Convolutional neural networks on graphs with fast localized spectral filtering\.Note:NeurIPS 2016External Links:1606\.09375,[Link](https://arxiv.org/abs/1606.09375)Cited by:[item 6](https://arxiv.org/html/2606.14956#S3.I2.i6.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.11.2)\.
- \[8\]F\. Diehl, T\. Brunner, M\. T\. Le, and A\. Knoll\(2019\-06\)Graph neural networks for modelling traffic participant interaction\.In2019 IEEE Intelligent Vehicles Symposium \(IV\),External Links:[Link](http://dx.doi.org/10.1109/IVS.2019.8814066),[Document](https://dx.doi.org/10.1109/ivs.2019.8814066)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p3.1)\.
- \[9\]J\. Du, S\. Zhang, G\. Wu, J\. M\. F\. Moura, and S\. Kar\(2018\)Topology adaptive graph convolutional networks\.External Links:1710\.10370,[Link](https://arxiv.org/abs/1710.10370)Cited by:[item 4](https://arxiv.org/html/2606.14956#S3.I2.i4.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.20.9.1)\.
- \[10\]D\. Duvenaud, D\. Maclaurin, J\. Aguilera\-Iparraguirre, R\. Gómez\-Bombarelli, T\. Hirzel, A\. Aspuru\-Guzik, and R\. P\. Adams\(2015\)Convolutional networks on graphs for learning molecular fingerprints\.Note:NeurIPS 2015External Links:1509\.09292,[Link](https://arxiv.org/abs/1509.09292)Cited by:[item 4](https://arxiv.org/html/2606.14956#S3.I2.i4.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.21.10.1)\.
- \[11\]J\. Gao, C\. Sun, H\. Zhao, Y\. Shen, D\. Anguelov, C\. Li, and C\. Schmid\(2020\)VectorNet: encoding hd maps and agent dynamics from vectorized representation\.External Links:2005\.04259,[Link](https://arxiv.org/abs/2005.04259)Cited by:[§I](https://arxiv.org/html/2606.14956#S1.p4.1)\.
- \[12\]W\. L\. Hamilton, R\. Ying, and J\. Leskovec\(2018\)Inductive representation learning on large graphs\.Note:NeurIPS 2017External Links:1706\.02216,[Link](https://arxiv.org/abs/1706.02216)Cited by:[item 1](https://arxiv.org/html/2606.14956#S3.I2.i1.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.13.2.1)\.
- \[13\]Y\. Huang, J\. Du, Z\. Yang, Z\. Zhou, L\. Zhang, and H\. Chen\(2022\-09\)A survey on trajectory\-prediction methods for autonomous driving\.IEEE Transactions on Intelligent Vehicles7\(3\),pp\. 652–674\.External Links:ISSN 2379\-8858,[Link](http://dx.doi.org/10.1109/TIV.2022.3167103),[Document](https://dx.doi.org/10.1109/tiv.2022.3167103)Cited by:[§I](https://arxiv.org/html/2606.14956#S1.p1.1)\.
- \[14\]D\. Kim and A\. Oh\(2022\)How to find your friendly neighborhood: graph attention design with self\-supervision\.Note:ICLR 2022External Links:2204\.04879,[Link](https://arxiv.org/abs/2204.04879)Cited by:[item 3](https://arxiv.org/html/2606.14956#S3.I2.i3.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.6.6.2)\.
- \[15\]G\. Kim, D\. Kim, Y\. Ahn, and K\. Huh\(2021\)Hybrid approach for vehicle trajectory prediction using weighted integration of multiple models\.IEEE Access9,pp\. 78715–78723\.External Links:ISSN 2169\-3536,[Link](http://dx.doi.org/10.1109/ACCESS.2021.3083918),[Document](https://dx.doi.org/10.1109/access.2021.3083918)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p4.1)\.
- \[16\]T\. N\. Kipf and M\. Welling\(2017\)Semi\-supervised classification with graph convolutional networks\.External Links:1609\.02907,[Link](https://arxiv.org/abs/1609.02907)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p3.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.12.1.1),[§III](https://arxiv.org/html/2606.14956#S3.p4.2)\.
- \[17\]R\. Krajewski, T\. Moers, J\. Bock, L\. Vater, and L\. Eckstein\(2020\)The round dataset: a drone dataset of road user trajectories at roundabouts in germany\.In2020 IEEE 23rd International Conference on Intelligent Transportation Systems \(ITSC\),pp\. 1–6\.External Links:[Document](https://dx.doi.org/10.1109/ITSC45102.2020.9294728)Cited by:[§IV](https://arxiv.org/html/2606.14956#S4.p1.1)\.
- \[18\]H\. Li, Z\. Liao, Y\. Rui, L\. Li, and B\. Ran\(2023\-12\)A physical law constrained deep learning model for vehicle trajectory prediction\.IEEE Internet of Things Journal10\(24\),pp\. 22775–22790\.External Links:ISSN 2372\-2541,[Link](http://dx.doi.org/10.1109/JIOT.2023.3305395),[Document](https://dx.doi.org/10.1109/jiot.2023.3305395)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p4.1)\.
- \[19\]Y\. Li, D\. Tarlow, M\. Brockschmidt, and R\. Zemel\(2017\)Gated graph sequence neural networks\.Note:ICLR 2016External Links:1511\.05493,[Link](https://arxiv.org/abs/1511.05493)Cited by:[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.22.11.1)\.
- \[20\]N\. A\. Madjid, A\. Ahmad, M\. Mebrahtu, Y\. Babaa, A\. Nasser, S\. Malik, B\. Hassan, N\. Werghi, J\. Dias, and M\. Khonji\(2025\)Trajectory prediction for autonomous driving: progress, limitations, and future directions\.External Links:2503\.03262,[Link](https://arxiv.org/abs/2503.03262)Cited by:[§I](https://arxiv.org/html/2606.14956#S1.p3.1)\.
- \[21\]S\. Mandal, S\. Biswas, V\. E\. Balas, R\. N\. Shaw, and A\. Ghosh\(2020\-10\)Motion prediction for autonomous vehicles from lyft dataset using deep learning\.In2020 IEEE 5th International Conference on Computing Communication and Automation \(ICCCA\),pp\. 768–773\.External Links:[Link](http://dx.doi.org/10.1109/ICCCA49541.2020.9250790),[Document](https://dx.doi.org/10.1109/iccca49541.2020.9250790)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p2.1)\.
- \[22\]K\. Messaoud, I\. Yahiaoui, A\. Verroust\-Blondet, and F\. Nashashibi\(2021\-03\)Attention based vehicle trajectory prediction\.IEEE Transactions on Intelligent Vehicles6\(1\),pp\. 175–185\.External Links:ISSN 2379\-8858,[Link](http://dx.doi.org/10.1109/TIV.2020.2991952),[Document](https://dx.doi.org/10.1109/tiv.2020.2991952)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p2.1)\.
- \[23\]K\. Messaoud, I\. Yahiaoui, A\. VerroustBlondet, and F\. Nashashibi\(2019\-06\)Non\-local social pooling for vehicle trajectory prediction\.In2019 IEEE Intelligent Vehicles Symposium \(IV\),External Links:[Link](https://doi.org/10.1109/IVS.2019.8813829),[Document](https://dx.doi.org/10.1109/IVS.2019.8813829)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p2.1)\.
- \[24\]C\. Morris, M\. Ritzert, M\. Fey, W\. L\. Hamilton, J\. E\. Lenssen, G\. Rattan, and M\. Grohe\(2021\)Weisfeiler and leman go neural: higher\-order graph neural networks\.Note:AAAI 2019External Links:1810\.02244,[Link](https://arxiv.org/abs/1810.02244)Cited by:[item 1](https://arxiv.org/html/2606.14956#S3.I2.i1.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.14.3.1)\.
- \[25\]N\. Nikhil and B\. T\. Morris\(2018\)Convolutional neural network for trajectory prediction\.InProceedings of the European Conference on Computer Vision \(ECCV\) Workshops,pp\. 186–196\.External Links:[Document](https://dx.doi.org/10.1007/978-3-030-11015-4%5F16)Cited by:[TABLE III](https://arxiv.org/html/2606.14956#S5.T3.3.5.1.1),[§V](https://arxiv.org/html/2606.14956#S5.p1.4)\.
- \[26\]E\. Ranjan, S\. Sanyal, and P\. P\. Talukdar\(2020\)ASAP: adaptive structure aware pooling for learning hierarchical graph representations\.Note:AAAI 2020External Links:1911\.07979,[Link](https://arxiv.org/abs/1911.07979)Cited by:[item 2](https://arxiv.org/html/2606.14956#S3.I2.i2.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.17.6.1)\.
- \[27\]Y\. Shi, Z\. Huang, S\. Feng, H\. Zhong, W\. Wang, and Y\. Sun\(2021\)Masked label prediction: unified message passing model for semi\-supervised classification\.Note:IJCAI 2021External Links:2009\.03509,[Link](https://arxiv.org/abs/2009.03509)Cited by:[item 3](https://arxiv.org/html/2606.14956#S3.I2.i3.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.5.5.2)\.
- \[28\]J\. Sorstedt, L\. Svensson, F\. Sandblom, and L\. Hammarstrand\(2011\-12\)A new vehicle motion model for improved predictions and situation assessment\.IEEE Transactions on Intelligent Transportation Systems12\(4\),pp\. 1209–1219\.External Links:ISSN 1524\-9050,[Link](http://dx.doi.org/10.1109/TITS.2011.2160342),[Document](https://dx.doi.org/10.1109/tits.2011.2160342)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p1.1)\.
- \[29\]M\. Steiner, M\. Klemp, and C\. Stiller\(2024\-06\)MAP\-former: multi\-agent\-pair gaussian joint prediction\.In2024 IEEE Intelligent Vehicles Symposium \(IV\),pp\. 758–764\.External Links:[Link](http://dx.doi.org/10.1109/IV55156.2024.10588702),[Document](https://dx.doi.org/10.1109/iv55156.2024.10588702)Cited by:[TABLE III](https://arxiv.org/html/2606.14956#S5.T3.3.3.1),[§V](https://arxiv.org/html/2606.14956#S5.p1.4)\.
- \[30\]S\. A\. Tailor, F\. L\. Opolka, P\. Liò, and N\. D\. Lane\(2022\)Do we need anisotropic graph neural networks?\.Note:ICLR 2022External Links:2104\.01481,[Link](https://arxiv.org/abs/2104.01481)Cited by:[item 3](https://arxiv.org/html/2606.14956#S3.I2.i3.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.4.4.3)\.
- \[31\]K\. K\. Thekumparampil, C\. Wang, S\. Oh, and L\. Li\(2018\)Attention\-based graph neural network for semi\-supervised learning\.External Links:1803\.03735,[Link](https://arxiv.org/abs/1803.03735)Cited by:[item 2](https://arxiv.org/html/2606.14956#S3.I2.i2.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.15.4.1)\.
- \[32\]W\. Wang, L\. Wang, C\. Zhang, C\. Liu, and L\. Sun\(2022\)Social interactions for autonomous driving: a review and perspectives\.ICLR 2017\.External Links:arXiv:2208\.07541,[Document](https://dx.doi.org/10.1561/2300000078)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p4.1)\.
- \[33\]F\. Wu, T\. Zhang, A\. H\. de Souza Jr\., C\. Fifty, T\. Yu, and K\. Q\. Weinberger\(2019\)Simplifying graph convolutional networks\.Note:ICML 2021External Links:1902\.07153,[Link](https://arxiv.org/abs/1902.07153)Cited by:[item 4](https://arxiv.org/html/2606.14956#S3.I2.i4.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.11.18.7.1)\.
- \[34\]G\. Xie, H\. Gao, L\. Qian, B\. Huang, K\. Li, and J\. Wang\(2018\-07\)Vehicle trajectory prediction by integrating physics\- and maneuver\-based approaches using interactive multiple models\.IEEE Transactions on Industrial Electronics65\(7\),pp\. 5999–6008\.External Links:ISSN 1557\-9948,[Link](http://dx.doi.org/10.1109/TIE.2017.2782236),[Document](https://dx.doi.org/10.1109/tie.2017.2782236)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p1.1)\.
- \[35\]P\. Xu, J\. Hayet, and I\. Karamouzas\(2023\-09\)Context\-aware timewise vaes for real\-time vehicle trajectory prediction\.IEEE Robotics and Automation Letters8\(9\),pp\. 5440–5447\.External Links:ISSN 2377\-3774,[Link](http://dx.doi.org/10.1109/LRA.2023.3295990),[Document](https://dx.doi.org/10.1109/lra.2023.3295990)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p2.1)\.
- \[36\]H\. Yao, X\. Li, and X\. Yang\(2023\-01\)Physics\-aware learning\-based vehicle trajectory prediction of congested traffic in a connected vehicle environment\.IEEE Transactions on Vehicular Technology72\(1\),pp\. 102–112\.External Links:ISSN 1939\-9359,[Link](http://dx.doi.org/10.1109/TVT.2022.3203906),[Document](https://dx.doi.org/10.1109/tvt.2022.3203906)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p4.1)\.
- \[37\]K\. Zhang, X\. Feng, L\. Wu, and Z\. He\(2022\-11\)Trajectory prediction for autonomous driving using spatial\-temporal graph attention transformer\.IEEE Transactions on Intelligent Transportation Systems23\(11\),pp\. 22343–22353\.External Links:ISSN 1558\-0016,[Link](http://dx.doi.org/10.1109/TITS.2022.3164450),[Document](https://dx.doi.org/10.1109/tits.2022.3164450)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p3.1)\.
- \[38\]K\. Zhang, L\. Zhao, C\. Dong, L\. Wu, and L\. Zheng\(2023\-01\)AI\-tp: attention\-based interaction\-aware trajectory prediction for autonomous driving\.IEEE Transactions on Intelligent Vehicles8\(1\),pp\. 73–83\.External Links:ISSN 2379\-8858,[Link](http://dx.doi.org/10.1109/TIV.2022.3155236),[Document](https://dx.doi.org/10.1109/tiv.2022.3155236)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p3.1)\.
- \[39\]Q\. Zhang, Y\. Xing, J\. Wang, Z\. Fang, Y\. Liu, and G\. Yin\(2025\-07\)Interaction\-aware and driving style\-aware trajectory prediction for heterogeneous vehicles in mixed traffic environment\.IEEE Transactions on Intelligent Transportation Systems26\(7\),pp\. 10710–10724\.External Links:ISSN 1558\-0016,[Link](http://dx.doi.org/10.1109/TITS.2025.3553697),[Document](https://dx.doi.org/10.1109/tits.2025.3553697)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p2.1)\.
- \[40\]Z\. Zhao, M\. Karimzadeh, L\. Pacheco, H\. Santos, D\. Rosario, T\. Braun, and E\. Cerqueira\(2020\-12\)Mobility management with transferable reinforcement learning trajectory prediction\.IEEE Transactions on Network and Service Management17\(4\),pp\. 2102–2116\.External Links:ISSN 2373\-7379,[Link](http://dx.doi.org/10.1109/TNSM.2020.3034482),[Document](https://dx.doi.org/10.1109/tnsm.2020.3034482)Cited by:[§I](https://arxiv.org/html/2606.14956#S1.p1.1)\.
- \[41\]Z\. Zhong, Y\. Luo, and W\. Liang\(2022\-10\)STGM: vehicle trajectory prediction based on generative model for spatial\-temporal features\.IEEE Transactions on Intelligent Transportation Systems23\(10\),pp\. 18785–18793\.External Links:ISSN 1558\-0016,[Link](http://dx.doi.org/10.1109/TITS.2022.3160648),[Document](https://dx.doi.org/10.1109/tits.2022.3160648)Cited by:[§II](https://arxiv.org/html/2606.14956#S2.p2.1)\.
- \[42\]H\. Zhu and P\. Koniusz\(2021\)Simple spectral graph convolution\.InInternational Conference on Learning Representations \(ICLR\),External Links:[Link](https://openreview.net/forum?id=CYO5T-YjWZV)Cited by:[item 4](https://arxiv.org/html/2606.14956#S3.I2.i4.p1.1),[TABLE I](https://arxiv.org/html/2606.14956#S3.T1.7.7.1)\.

Similar Articles

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

arXiv cs.LG

Proposes GC-MoE, a graph-conditioned mixture of experts framework for traffic forecasting that assigns each node a personalized combination of frozen pretrained spatio-temporal GNN experts based on graph topology and recent input, training only a lightweight routing module (∼17K parameters) and achieving competitive performance on four benchmarks.

A Global-Local Graph Attention Network for Traffic Forecasting

arXiv cs.AI

Proposes a Global-Local Graph Attention Network (GLGAT) with pairwise encoding and event-based adjacency matrix for traffic forecasting, effectively capturing spatio-temporal correlations and achieving competitive performance on real-world datasets.

Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models

arXiv cs.LG

Researchers propose a lightweight autoregressive framework for graph generation that uses structure-guided topological ordering to achieve near log-linear complexity, addressing scalability and novelty limitations of existing diffusion and autoregressive methods. The approach supports both LSTM and Mamba-style backbones and shows improved novelty while maintaining validity and uniqueness on molecular and non-molecular benchmarks.