Transformer-Based Wildlife Species Classification from Daily Movement Trajectories
Summary
This paper presents a Transformer-based model for classifying wildlife species using only daily GPS movement trajectories, demonstrating superior accuracy over LSTM and CNN baselines across different studies and regions.
View Cached Full Text
Cached at: 05/11/26, 06:42 AM
# Transformer-Based Wildlife Species Classification from Daily Movement Trajectories
Source: [https://arxiv.org/html/2605.06726](https://arxiv.org/html/2605.06726)
###### Abstract
Inferring the identity of wildlife species from daily movement data alone is a challenging task\. We train sequence models on large\-scale, 7\-species GPS trajectories from the Movebank platform\. Trajectories models are evaluated using a protocol in which entire telemetry studies or regions are held out during testing\. We compare Transformer\-based sequence models to LSTM, CNN, and Temporal Convolutional Networks, and find that Transformers consistently achieve higher balanced accuracy with gains of approximately 8 to 22 percentage points, depending on the species and experimental setting\. In an elephant binary classification task with 1\-hour resolution, the Transformer achieves a balanced accuracy of 0\.83 and an AUC of 0\.92, substantially outperforming all baseline models\. We examine, under data\-limited conditions, feature representations by analyzing the differences between a basic displacement\-based encoding and an expanded range of movement descriptors that include speed, direction, and turning behavior\. With feature augmentation, we see clear performance gains, especially for underrepresented and sparsely represented species, such as large carnivores, lions, and Zebras\. Finally, experiments comparing 1\-hour and 30\-minute temporal resolutions show that while finer sampling can capture short\-term movement patterns for some species, a unified 1\-hour resolution yields more promising performance across studies by reducing missing data and ensuring consistent temporal coverage\.
## IIntroduction
Advances in animal\-borne telemetry and open data portals such as Movebank have facilitated the accumulation of wildlife movement trajectories between species, locations, and studies\[[11](https://arxiv.org/html/2605.06726#bib.bib2)\]\. We address whether the identity of species of wildlife can be inferred solely from movement trajectories, without relying on absolute geographic location, habitat descriptors, or environmental covariates\. Here, a study is defined as an independent telemetry effort in a specific area at the same time with its own sampling method, collar technology, and environmental context\[[9](https://arxiv.org/html/2605.06726#bib.bib5)\]\. Models trained on data from a single study or region can achieve strong performance locally, but they often fail when applied to other areas, as movement trajectories are shaped not only by species behavior, but also by region\-specific factors such as fences, roads, land\-use patterns and human activity\[[26](https://arxiv.org/html/2605.06726#bib.bib7)\]\. The development of multiple separate models for each study or region is often not desirable, as there are no labeled data everywhere and new monitoring locations continue to be added\. Combining data across regions allows us to assess whether the same species exhibits consistent movement patterns across different parks and countries despite local differences\[[18](https://arxiv.org/html/2605.06726#bib.bib8)\]\. We seek to develop models that generalize across studies and regions, allowing species classification to rely on intrinsic movement behavior rather than site\-specific routine and idiosyncrasies, and to remain usable in areas without data or previously unseen\[[22](https://arxiv.org/html/2605.06726#bib.bib9)\]\[[31](https://arxiv.org/html/2605.06726#bib.bib10)\]\.
Our objective is to build a model that predicts the entity of species using movement trajectories alone, and our dataset comprises 7 species\. In contrast to using contextual or environmental variables, we explicitly examine whether species can be distinguished from each other based only on the movement of the species\. In addition to the prediction accuracy, we study the movement behaviors that enable successful classification of the species in order to describe movement behavior patterns that differ between the species on a daily basis\. In terms of movement, this can include the structure of movement in space, the speed of movement, the direction of movement, the turning angles, or the patterns of movement\. Using telemetry data from varying conditions in the region, we analyze species\-specific movement patterns and how these patterns can be used to identify the species\.
There are several factors that make this problem particularly challenging\. One of them is that animal telemetry data exhibit strong spatial, temporal, and individual\-level autocorrelation\. Given that consecutive observations of trajectories from the same animal or region are highly similar, random train–test splits can inflate the performance of the model by allowing models to learn where an animal is rather than how it moves\[[22](https://arxiv.org/html/2605.06726#bib.bib9)\]\. Secondly, telemetry datasets differ widely in their temporal sampling resolution across studies\. Resampling movement trajectories to a large temporal resolution intervals can suppress short\-term movement dynamics like pauses, directional changes, or bursts of activity that are often ecologically informative and they potential species pattern that might distinguish the species\[[8](https://arxiv.org/html/2605.06726#bib.bib12)\]\. Therefore, balancing temporal resolution consistency across studies which preserve informative movement patterns, presents a fundamental challenge for trajectory\-based species classification\.
Most prior work on animal movement modeling has focused on behavioral state classification\[[28](https://arxiv.org/html/2605.06726#bib.bib13)\], habitat selection\[[17](https://arxiv.org/html/2605.06726#bib.bib14)\], or within\-study prediction, often using Hidden Markov Models, step\-selection functions, or recurrent neural networks evaluated under individual\-level or random cross\-validation schemes\[[20](https://arxiv.org/html/2605.06726#bib.bib15)\]\. Although recent studies emphasize spatially structured validation and the risks of location bias\[[22](https://arxiv.org/html/2605.06726#bib.bib9)\], few explicitly address cross\-study species classification from movement alone, and fewer still examine how feature representation and temporal resolution interact with modern attention\-based sequence models\. Consequently, it remains unclear whether the performance gains reported reflect true species\-level movement signatures or artifacts of study\-specific sampling and geography\[[26](https://arxiv.org/html/2605.06726#bib.bib7),[22](https://arxiv.org/html/2605.06726#bib.bib9)\]\.
In this work, we study the classification of wildlife species from movement trajectories by learning species\-specific movement patterns\. We represent each animal’s movement as an ordered sequence of GPS observations collected on a single UTC calendar day\. For each animal\-day, this yields sequences of up to 24 positions at 1\-hour temporal resolution or 48 positions at 30\-minute resolution, which are used as input to sequence models trained to predict species identity\.
We focus on African wildlife species that are well represented in open\-access telemetry datasets and exhibit diverse movement behaviors within a shared continental context\. This allows us to train models on data collected in some regions and test them on data from different regions, evaluating whether species can be distinguished based on movement patterns rather than location\-specific characteristics\. Using Movebank data\[[12](https://arxiv.org/html/2605.06726#bib.bib17)\], we systematically select seven African species with open\-access high\-resolution trajectories available for download and analysis—baboon, buffalo, caracal, Zebra, elephant, lion and wildebeest\. Individual Movebank telemetry studies may include multiple species, and species are represented by varying numbers of tracked animals\. We compare a Transformer\-based sequence model\[[27](https://arxiv.org/html/2605.06726#bib.bib20)\]with baselines from the LSTM, CNN, and Temporal Convolutional Network\[[10](https://arxiv.org/html/2605.06726#bib.bib21)\]\. In studies held\-out for the test dataset, the Transformer achieves a balanced accuracy of 0\.81 and an AUC of 0\.92, outperforming baselines whose balanced accuracy ranges from 0\.68 to 0\.77 and whose AUC ranges from 0\.78 to 0\.87\. Moreover, enhancing a minimal displacement\-based encoding with additional movement descriptors derived from the same trajectories – speed, bearing, and turning angle—improves balanced accuracy by 43\.10%\. Finally, we evaluated the effect of temporal resolution and showed that although finer 30\-minute resampling can reflect short\-term movement dynamics for certain species, a common 1\-hour resolution is better suited for cross\-study modeling due to reduced missingness and more consistent temporal coverage\[[19](https://arxiv.org/html/2605.06726#bib.bib22)\]\.
## IIRelated Work
### II\-ATraditional Movement Modeling and Species Identification
The analysis of animal movement trajectories is grounded in the movement ecology paradigm\[[18](https://arxiv.org/html/2605.06726#bib.bib8)\]\. Traditional approaches rely on correlated random walks, step\-selection functions, and Hidden Markov Models \(HMMs\) to infer latent behavioral states \(e\.g\., foraging, resting\) from GPS telemetry\[[19](https://arxiv.org/html/2605.06726#bib.bib22),[1](https://arxiv.org/html/2605.06726#bib.bib23)\]\. While highly effective for behavioral segmentation, these generative models are typically species\- or region\-specific and are not designed for species\-level classification\.
Species identification from telemetry data remains comparatively underexplored\. Existing studies predominantly employ classical machine learning models, such as Random Forests, relying heavily on hand\-crafted features\[[28](https://arxiv.org/html/2605.06726#bib.bib13)\]\. Crucially, prior evaluations often utilize random train\-test splits\. As highlighted in\[[22](https://arxiv.org/html/2605.06726#bib.bib9)\]and\[[26](https://arxiv.org/html/2605.06726#bib.bib7)\], this practice introduces geographic leakage, allowing models to memorize site\-specific environmental routines rather than intrinsic, generalizable species behaviors\. Consequently, models trained in this manner can lead to overly optimistic performance and limited generalization to unseen regions or studies\[[15](https://arxiv.org/html/2605.06726#bib.bib27)\]\.
### II\-BDeep Learning for Trajectory Classification
Deep learning \(DL\) facilitates the end\-to\-end modeling of movement trajectories as multivariate time series\[[6](https://arxiv.org/html/2605.06726#bib.bib24)\]\. Sequential architectures like Recurrent Neural Networks \(RNNs\), LSTMs, and Temporal Convolutional Networks \(TCNs\) have shown promise for behavioral state inference and short\-term trajectory prediction\[[21](https://arxiv.org/html/2605.06726#bib.bib26)\]\. Recently, Transformers\[[27](https://arxiv.org/html/2605.06726#bib.bib20)\]have emerged as powerful sequence encoders, adept at modeling long\-range temporal dependencies in daily movement sequences\.
However, the application of DL to identify species across distinct geographical regions remains nascent\. Most existing DL studies evaluate performance within a single dataset, leaving their capacity for multi\-study generalization untested\. Our study directly addresses this gap\. By modeling daily trajectories using Transformers and explicitly enforcing study\-level holdouts during evaluation, our methodology aligns with recent calls for rigorous validation in AI\-driven conservation\[[25](https://arxiv.org/html/2605.06726#bib.bib28)\], ensuring the extraction of true species\-specific kinematic signatures rather than localized spatial artifacts\.
## IIIData and Preprocessing
This section describes the telemetry datasets used in this study and the preprocessing steps applied to construct comparable daily movement trajectories collected under different sampling protocols\.
### III\-AData Source and Species Selection
We use GPS telemetry data sourced from Movebank, a worldwide archive of animal movement data\[[11](https://arxiv.org/html/2605.06726#bib.bib2)\]\[[12](https://arxiv.org/html/2605.06726#bib.bib17)\]\. The dataset is made up of tracking data from seven species of African wildlife:*baboon*,*buffalo*,*caracal*,*Zebra*,*elephant*,*lion*, and*wildebeest*, and spans diverse regions of Africa and separate telemetry studies\.
A study is defined here as an independent telemetry campaign conducted in a specific region and time period, typically characterized by its own sampling protocol, collar technology, and tracking duration\. Individual studies may include multiple species, and species are represented by varying numbers of tracked animals\. Because data originate from different studies with distinct collection protocols, models trained and evaluated using random splits risk capturing study\- or location\-specific patterns rather than species\-level movement characteristics\. To mitigate this effect, we evaluated models by holding out one entire study during testing for each species, ensuring that test data come from regions and data collection settings not observed during training and validation\. In total, this research utilizes 16 distinct Movebank datasets purposefully selected to cover diverse geographical regions\. These individual telemetry campaigns were conducted at various times between 1998 and 2023\. Details of the specific Movebank studies and dataset splits used for these experiments are summarized in Table[I](https://arxiv.org/html/2605.06726#S3.T1)\.
TABLE I:Movebank studies used and dataset splits\.SpeciesMovebank Study IDSplit UsageBaboon2131q5 \(DOI\)TestBaboon1723547Train/ValLion220229Train/ValLion220229TestLion150531Train/ValWildebeest132915TestWildebeest225301Train/ValWildebeest1310113Train/ValElephant736029750Train/ValElephant1818825TestElephant3nj3qj45 \(DOI\)Train/ValElephant1630/2970/5990Train/ValBuffalo2138Train/ValBuffalo1803741Train/ValCaracal1\.317 \(DOI\)Train/Val/TestBuffalo/Zebra259966228Buffalo: TestZebra: All splits
### III\-BTemporal Standardization and Resampling
Telemetry data exhibit varying sampling intervals, ranging from less than one hour to more than one hour between consecutive observations\. To ensure temporal comparability across studies, all trajectories were resampled onto fixed temporal grids using two resolutions: 1\-hour and 30\-minute intervals\. A uniform temporal resolution is a common prerequisite for comparative movement analysis and sequence modeling\[[8](https://arxiv.org/html/2605.06726#bib.bib12)\]\.
Let\{\(ti,𝐩i\)\}i=1N\\\{\(t\_\{i\},\\mathbf\{p\}\_\{i\}\)\\\}\_\{i=1\}^\{N\}denote the original GPS trajectory of an individual animal, wheretit\_\{i\}is the timestamp and𝐩i=\(lati,loni\)\\mathbf\{p\}\_\{i\}=\(\\text\{lat\}\_\{i\},\\text\{lon\}\_\{i\}\)the recorded position\. Prior to resampling, any raw observations recorded at the exact same timestamp are spatially averaged to reduce measurement noise while preserving temporal continuity\.
We construct trajectories on a regular temporal grid with resolutionΔt\\Delta t\(Δt=1\\Delta t=1hour for hourly resampling andΔt=30\\Delta t=30minutes for half\-hourly resampling\)\. Each original timestamptit\_\{i\}is mapped to the nearest grid time viat~i=roundΔt\(ti\)\\tilde\{t\}\_\{i\}=\\operatorname\{round\}\_\{\\Delta t\}\(t\_\{i\}\), whereroundΔt\(⋅\)\\operatorname\{round\}\_\{\\Delta t\}\(\\cdot\)denotes rounding to the nearest multiple ofΔt\\Delta t, ensuring that the temporal displacement satisfies\|ti−t~i\|≤Δt/2\|t\_\{i\}\-\\tilde\{t\}\_\{i\}\|\\leq\\Delta t/2\. If multiple observations map to the same grid time, only the observation with minimal\|ti−t~i\|\|t\_\{i\}\-\\tilde\{t\}\_\{i\}\|is retained\.
Let𝒯=\{τ0,τ1,…,τn\}\\mathcal\{T\}=\\\{\\tau\_\{0\},\\tau\_\{1\},\\dots,\\tau\_\{n\}\\\}denote the complete regular grid spanning from the earliest to the latest retained observation, with spacingΔt\\Delta t, and let𝒯∗⊆𝒯\\mathcal\{T\}^\{\*\}\\subseteq\\mathcal\{T\}denote the subset of grid times at which a valid \(non\-interpolated\) observation exists\. The resampled position𝐩\(τ\)\\mathbf\{p\}\(\\tau\)at any target grid timeτ∈𝒯\\tau\\in\\mathcal\{T\}is defined as:
𝐩\(τ\)=\{𝐩k,τ=τk∈𝒯∗,𝐩k\+τ−τkτk\+1−τk\(𝐩k\+1−𝐩k\),τk,τk\+1∈𝒯∗,τk<τ<τk\+1,τk\+1−τk=2Δt,undefined,otherwise\.\\mathbf\{p\}\(\\tau\)=\\begin\{cases\}\\mathbf\{p\}\_\{k\},&\\tau=\\tau\_\{k\}\\in\\mathcal\{T\}^\{\*\},\\\\\[6\.0pt\] \\mathbf\{p\}\_\{k\}\+\\frac\{\\tau\-\\tau\_\{k\}\}\{\\tau\_\{k\+1\}\-\\tau\_\{k\}\}\\,\(\\mathbf\{p\}\_\{k\+1\}\-\\mathbf\{p\}\_\{k\}\),&\\tau\_\{k\},\\tau\_\{k\+1\}\\in\\mathcal\{T\}^\{\*\},\\\\ &\\tau\_\{k\}<\\tau<\\tau\_\{k\+1\},\\\\ &\\tau\_\{k\+1\}\-\\tau\_\{k\}=2\\Delta t,\\\\\[8\.0pt\] \\text\{undefined\},&\\text\{otherwise\.\}\\end\{cases\}\(1\)Linear interpolation is therefore applied only when exactly one intermediate grid point is missing between two consecutive valid observations in𝒯∗\\mathcal\{T\}^\{\*\}\. If the temporal gap between consecutive valid observations exceeds2Δt2\\Delta t, the intermediate positions remain undefined and are excluded from further analysis\.
Importantly, no forward\-filling or backward\-filling of positions is performed\. Repeated positions at consecutive grid times occur only when identical coordinates are present in the original data\.
### III\-CDaily Trajectory Construction
To prevent temporal information leakage, trajectories are segmented into daily sequences based on UTC \(max 24 points for 1\-hour resolution, 48 points for 30\-minute\)\. We retain days with at least 12 observations \(1\-hour\) or 25 observations \(30\-minute\)\. This coverage\-based filtering balances temporal completeness with data retention\[[26](https://arxiv.org/html/2605.06726#bib.bib7)\]\.
Table[II](https://arxiv.org/html/2605.06726#S3.T2)details the final dataset composition, illustrating the retained animal\-days, unique individuals, and total GPS points per species\.
TABLE II:Dataset statistics after preprocessing for 30\-minute and 1\-hour resampling protocols\.
### III\-DKinematic and Temporal Encoding
To avoid reliance on absolute geographic location, latitude \(ϕ\\phi\) and longitude \(λ\\lambda\) in radians are projected onto the unit sphere:x=cosϕcosλx=\\cos\\phi\\cos\\lambda,y=cosϕsinλy=\\cos\\phi\\sin\\lambda,z=sinϕz=\\sin\\phi\. We compute instantaneous displacement vectors asΔxt=xt−xt−1\\Delta x\_\{t\}=x\_\{t\}\-x\_\{t\-1\},Δyt=yt−yt−1\\Delta y\_\{t\}=y\_\{t\}\-y\_\{t\-1\}, andΔzt=zt−zt−1\\Delta z\_\{t\}=z\_\{t\}\-z\_\{t\-1\}\. While unit\-sphere displacement scales can technically vary by latitude, our strict study\-level holdout strategy guarantees that models cannot exploit location\-specific displacement magnitudes, forcing the architecture to learn generalized, relative movement motifs\.
We encode time cyclically to capture circadian rhythms without boundary discontinuities\[[13](https://arxiv.org/html/2605.06726#bib.bib32)\]\. For 1\-hour resolution, the hour of dayh∈\{0,…,23\}h\\in\\\{0,\\dots,23\\\}is encoded ashoursin=sin\(2πh/24\)\\text\{hour\}\_\{\\sin\}=\\sin\(2\\pi h/24\)andhourcos=cos\(2πh/24\)\\text\{hour\}\_\{\\cos\}=\\cos\(2\\pi h/24\)\. For 30\-minute intervals, the minutes since midnightm∈\{0,…,1439\}m\\in\\\{0,\\dots,1439\\\}are encoded asminsin=sin\(2πm/1440\)\\text\{min\}\_\{\\sin\}=\\sin\(2\\pi m/1440\)andmincos=cos\(2πm/1440\)\\text\{min\}\_\{\\cos\}=\\cos\(2\\pi m/1440\)\. This ensures stable, coherent temporal representations across resolutions\.
## IVFeature Engineering
We describe the movement features used for species classification and contrast a minimal baseline representation with an augmented feature set designed to improve learning when data availability varies across species and studies\.
### IV\-ABaseline Feature Representation
As a baseline, for each daily trajectory, we extract a minimal 5\-dimensional feature vector at each time steptt:Fbase,t=\[Δxt,Δyt,Δzt,tsin,tcos\]F\_\{\\text\{base\},t\}=\[\\Delta x\_\{t\},\\,\\Delta y\_\{t\},\\,\\Delta z\_\{t\},\\,t\_\{\\sin\},\\,t\_\{\\cos\}\], where\(Δxt,Δyt,Δzt\)\(\\Delta x\_\{t\},\\Delta y\_\{t\},\\Delta z\_\{t\}\)represents the spatial displacement on the unit sphere, andtsint\_\{\\sin\}andtcost\_\{\\cos\}refer to the cyclic time\-of\-day encodings \(hourormin, depending on the resampling resolution\) defined in Section III\.
This low\-dimensional representation intentionally discards higher\-order kinematics \(e\.g\., speed and curvature\) to serve as a reference point for evaluating the value of richer motion features\.
### IV\-BAugmented Movement Features
To capture higher\-level movement dynamics that may be characteristic of different species, we augment the baseline representation with additional kinematic and directional features\.
#### Step Length and Speed
The step length between consecutive positions is defined as
ℓt=Δxt2\+Δyt2\+Δzt2\.\\ell\_\{t\}=\\sqrt\{\\Delta x\_\{t\}^\{2\}\+\\Delta y\_\{t\}^\{2\}\+\\Delta z\_\{t\}^\{2\}\}\.\(2\)Speed is computed as distance traveled per unit time:
vt=ℓtΔtt,v\_\{t\}=\\frac\{\\ell\_\{t\}\}\{\\Delta t\_\{t\}\},\(3\)whereΔtt\\Delta t\_\{t\}denotes the time elapsed between consecutive fixes\. Speed provides information on movement intensity and is often linked to species\-specific locomotion strategies and energy budgets\[[24](https://arxiv.org/html/2605.06726#bib.bib34)\]\.
#### Movement Direction \(Bearing\)
The instantaneous movement direction is represented by the bearing angle:
θt=arctan2\(Δyt,Δxt\)\.\\theta\_\{t\}=\\arctan 2\(\\Delta y\_\{t\},\\Delta x\_\{t\}\)\.\(4\)To avoid discontinuities in±π\\pm\\pi, the bearing is encoded using its sine and cosine components:
θsin,t=sin\(θt\),θcos,t=cos\(θt\)\.\\theta\_\{\\sin,t\}=\\sin\(\\theta\_\{t\}\),\\quad\\theta\_\{\\cos,t\}=\\cos\(\\theta\_\{t\}\)\.\(5\)
#### Turning Angle
Directional changes between successive steps are captured by the turning angle:
Δθt=θt−θt−1,\\Delta\\theta\_\{t\}=\\theta\_\{t\}\-\\theta\_\{t\-1\},\(6\)wrapped in the interval\[−π,π\]\[\-\\pi,\\pi\]\. As with a bearing, the turning angle is encoded cyclically:
Δθsin,t=sin\(Δθt\),Δθcos,t=cos\(Δθt\)\.\\Delta\\theta\_\{\\sin,t\}=\\sin\(\\Delta\\theta\_\{t\}\),\\quad\\Delta\\theta\_\{\\cos,t\}=\\cos\(\\Delta\\theta\_\{t\}\)\.\(7\)Turning behavior reflects path tortuosity and movement persistence, which vary across species and ecological contexts \(e\.g\., foraging vs\. traveling\)\[[3](https://arxiv.org/html/2605.06726#bib.bib35)\]\.
Together with the baseline features, the augmented representation yields a ten\-dimensional feature vector per time step\.
\{Δxt,Δyt,Δzt,vt,θsin,t,θcos,t,\\displaystyle\\\{\\Delta x\_\{t\},\\,\\Delta y\_\{t\},\\,\\Delta z\_\{t\},\\,v\_\{t\},\\,\\theta\_\{\\sin,t\},\\,\\theta\_\{\\cos,t\},\(8\)Δθsin,t,Δθcos,t,hoursin,hourcos\}\.\\displaystyle\\Delta\\theta\_\{\\sin,t\},\\,\\Delta\\theta\_\{\\cos,t\},\\,\\text\{hour\}\_\{\\sin\},\\,\\text\{hour\}\_\{\\cos\}\\\}\.And become like this for 30 min resample:
\{Δxt,Δyt,Δzt,vt,θsin,t,θcos,t,\\displaystyle\\\{\\Delta x\_\{t\},\\,\\Delta y\_\{t\},\\,\\Delta z\_\{t\},\\,v\_\{t\},\\,\\theta\_\{\\sin,t\},\\,\\theta\_\{\\cos,t\},\(9\)Δθsin,t,Δθcos,t,minsin,mincos\}\.\\displaystyle\\Delta\\theta\_\{\\sin,t\},\\,\\Delta\\theta\_\{\\cos,t\},\\,\\text\{min\}\_\{\\sin\},\\,\\text\{min\}\_\{\\cos\}\\\}\.
### IV\-CGap\-Aware Feature Semantics
Telemetry data frequently contain missing fixes, leading to irregular time gaps between observations\. To avoid inferring artificial movement across such gaps, we adopt a gap\-aware feature construction strategy\.
LetΔtt\\Delta t\_\{t\}denote the time difference between consecutive observations\. Movement\-derived features \(displacements, speed, bearing, and turning\) are considered valid only whenΔtt\\Delta t\_\{t\}is equal to the nominal sampling interval\. When this condition is violated, the movement features are marked as undefined rather than imputed:
movement\_validt=\{1,ifΔtt=Δtnominal,0,otherwise\.\\text\{movement\\\_valid\}\_\{t\}=\\begin\{cases\}1,&\\text\{if \}\\Delta t\_\{t\}=\\Delta t\_\{\\text\{nominal\}\},\\\\ 0,&\\text\{otherwise\}\.\\end\{cases\}\(10\)
For time steps wheremovement\_validt=0\\text\{movement\\\_valid\}\_\{t\}=0, all movement\-derived features are set to missing values and are handled downstream through masking during sequence modeling\. The Time\-of\-day features remain defined for all timesteps\.
This design preserves the semantic distinction between*no movement*and*unknown movement*, preventing the model from learning spurious patterns induced by interpolation or zero\-filling\. This gap\-aware handling is consistent with best practices for temporally structured ecological data and is critical for promising learning under diverse sampling regimes\[[5](https://arxiv.org/html/2605.06726#bib.bib36)\]\.
## VModel Architecture
### V\-ASequence Modeling Formulation
We formulate species identification as supervised sequence classification over daily movement trajectories\. For each animal\-day, we construct a fixed\-length sequence𝐗∈ℝT×F\\mathbf\{X\}\\in\\mathbb\{R\}^\{T\\times F\}, whereT∈\{24,48\}T\\in\\\{24,48\\\}is the number of timesteps per day \(1\-hour or 30\-minute resampling\) andF∈\{5,10\}F\\in\\\{5,10\\\}is the dimension of the feature \(Section IV\)\. Each timestep𝐱t∈ℝF\\mathbf\{x\}\_\{t\}\\in\\mathbb\{R\}^\{F\}contains displacement\- and time\-derived features; augmented variants also include speed and directional descriptors\.
Daily sequences may be shorter thanTTdue to missing fixes\. We therefore pad sequences with zeros and provide a binary mask𝐦∈\{0,1\}T\\mathbf\{m\}\\in\\\{0,1\\\}^\{T\}indicating which timesteps are observed:
mt=\{1,if timesteptis observed,0,if timesteptis padding\.m\_\{t\}=\\begin\{cases\}1,&\\text\{if timestep \}t\\text\{ is observed\},\\\\ 0,&\\text\{if timestep \}t\\text\{ is padding\}\.\\end\{cases\}\(11\)In addition, movement\-derived features \(e\.g\. speed, bearing, turning\) are treated as undefined across temporal gaps \(Section IV\-C\) and handled via masking / NaN\-to\-zero conversion at tensor construction time, ensuring that missing movement is not conflated with true zero movement\.
### V\-BTransformer Architecture
Our primary model is a transformer encoder adapted for daily trajectory classification\. Each input timestep is first projected to add\-dimensional embedding space:
𝐳t=𝐖𝐱t\+𝐛,𝐖∈ℝd×F\.\\mathbf\{z\}\_\{t\}=\\mathbf\{W\}\\mathbf\{x\}\_\{t\}\+\\mathbf\{b\},\\quad\\mathbf\{W\}\\in\\mathbb\{R\}^\{d\\times F\}\.\(12\)To preserve temporal order, we add sinusoidal positional encodings\[[27](https://arxiv.org/html/2605.06726#bib.bib20)\]:
PE\(t,2i\)\\displaystyle\\mathrm\{PE\}\(t,2i\)=sin\(t/100002i/d\),\\displaystyle=\\sin\\left\(t/10000^\{2i/d\}\\right\),\(13\)PE\(t,2i\+1\)\\displaystyle\\mathrm\{PE\}\(t,2i\+1\)=cos\(t/100002i/d\)\.\\displaystyle=\\cos\\left\(t/10000^\{2i/d\}\\right\)\.
and use a learnable classification token \(\[CLS\]\) prepended to the sequence\. The model appliesLLpre\-norm transformer encoder layers with multi\-head self\-attention and feed\-forward blocks\. For an input matrix𝐙∈ℝT×d\\mathbf\{Z\}\\in\\mathbb\{R\}^\{T\\times d\}, self\-attention is calculated as follows:
Attn\(𝐐,𝐊,𝐕\)=softmax\(𝐐𝐊⊤dk\)𝐕,\\mathrm\{Attn\}\(\\mathbf\{Q\},\\mathbf\{K\},\\mathbf\{V\}\)=\\mathrm\{softmax\}\\left\(\\frac\{\\mathbf\{Q\}\\mathbf\{K\}^\{\\top\}\}\{\\sqrt\{d\_\{k\}\}\}\\right\)\\mathbf\{V\},\(14\)where𝐐=𝐙𝐖Q\\mathbf\{Q\}=\\mathbf\{Z\}\\mathbf\{W\}\_\{Q\},𝐊=𝐙𝐖K\\mathbf\{K\}=\\mathbf\{Z\}\\mathbf\{W\}\_\{K\}, and𝐕=𝐙𝐖V\\mathbf\{V\}=\\mathbf\{Z\}\\mathbf\{W\}\_\{V\}\. Padding positions are excluded using a key padding mask derived from𝐦\\mathbf\{m\}\. The final representation of the\[CLS\]token is passed through a linear head to produce class logits\.
Transformers are well\-suited for movement trajectories because they can model long\-range temporal dependencies within a day \(e\.g\., multi\-hour rest–move cycles\) without the recurrence bottlenecks of RNNs, and they can integrate diverse movement cues \(displacement, direction, turning, and circadian time\) through attention\-based aggregation\[[32](https://arxiv.org/html/2605.06726#bib.bib37)\]\.
### V\-CBaseline Models
To contextualize performance, we compare the transformer against standard sequential architectures commonly used in time\-series classification\. All baselines were tuned comparably using the exact same validation splits and optimization strategy to ensure a fair comparison:
- •LSTM: recurrent modeling of temporal dependencies via gated updates \(hidden size: 64, 2 layers\)\.
- •1D CNN: convolutional filters over time capturing local movement motifs\. Our architecture utilizes three parallel convolutional branches with kernel sizes of 3, 5, and 7 \(64 filters each\), followed by Group Normalization \(8 groups\) and global average pooling\.
- •TCN: dilated causal convolutions enabling wider receptive fields\. The network consists of 4 residual blocks \(64 hidden channels, kernel size: 3\) with exponentially increasing dilation rates \(1,2,4,81,2,4,8\), temporal dropout of 0\.2, and masked mean pooling\.
TABLE III:Models compared in this study\.\[[29](https://arxiv.org/html/2605.06726#bib.bib38),[2](https://arxiv.org/html/2605.06726#bib.bib39)\]
### V\-DTraining Objective and Optimization
Models are trained using a standard cross\-entropy loss objective\. To mitigate class imbalance without discarding valuable movement data, we apply class\-weighted loss, where weights are inversely proportional to class frequencies in the training set\.
We optimize using AdamW\[[14](https://arxiv.org/html/2605.06726#bib.bib40)\]with an initial learning rate of3×10−43\\times 10^\{\-4\}, weight decay of10−410^\{\-4\}, and gradient clipping at a maximum norm of 1\.0 and apply dropout\[[23](https://arxiv.org/html/2605.06726#bib.bib41)\]for regularization\. Models were trained with a batch size of 128 for a maximum of 50 epochs\. Training was dynamically halted via early stopping \(patience of 6 epochs monitoring validation loss\) to prevent overfitting\. AReduceLROnPlateauscheduler \(factor of 0\.5, patience of 2, minimum learning rate10−510^\{\-5\}\) dynamically reduced the learning rate when validation improvements stagnated\. A fixed random seed strategy was utilized across all experiments to guarantee deterministic data splits and weight initialization\.
### V\-EExperimental Setup
To rigorously test cross\-region generalization, we employ a hierarchical,*study\-aware*evaluation strategy\. For each species, exactly one entire telemetry study is held out exclusively for testing\. The remaining data is partitioned into training and validation sets using a strict animal\-level split, ensuring all trajectories from a given individual belong entirely to one set\. Consequently, hyperparameter tuning relies solely on the validation split without exposure to the test study\. This prevents the spatial and temporal leakage inherent in naive random splitting, providing an unbiased test of the model’s capacity to learn intrinsic movement patterns\[[30](https://arxiv.org/html/2605.06726#bib.bib45)\]\.
Models are trained in a supervised manner using the constructed daily trajectories\. To address class imbalance without discarding valuable movement data via aggressive downsampling, we apply a class\-weighted cross\-entropy loss, where weights are inversely proportional to class frequencies in the training set\[[4](https://arxiv.org/html/2605.06726#bib.bib42)\]\.
Performance is primarily evaluated using balanced accuracy, which averages recall across classes to effectively handle test set imbalance\. We also report the F1 score and the area under the receiver operating characteristic curve \(AUC\) as secondary metrics\[[7](https://arxiv.org/html/2605.06726#bib.bib43)\]\. Finally, to increase interpretability, we analyze confusion matrices to identify class\-wise error asymmetry and species\-specific error modes\[[16](https://arxiv.org/html/2605.06726#bib.bib44)\]\.
## VIResults and Performance Analysis
In this section, we present the experimental results for movement\-based species classification\. Expanding upon single\-species models, we evaluate the performance of seven distinct binary classifiers \(One\-vs\-Rest\) encompassing Elephants, Wildebeests, Lions, Buffaloes, Caracals, Baboons, and Zebras\. We assess the performance of our Transformer\-based architecture against standard sequential baselines, evaluate the impact of kinematic feature enhancement, and analyze the effect of temporal sampling resolution on model accuracy\.
### VI\-AComparison with Sequential Baselines
We first compare the Transformer against standard sequential baselines—including Long Short\-Term Memory \(LSTM\), 1D Convolutional Neural Networks \(1D CNN\), and Temporal Convolutional Networks \(TCN\)—using a 1\-hour temporal resolution and our augmented feature set\. Table[IV](https://arxiv.org/html/2605.06726#S6.T4)summarizes the classification performance across all seven species\.
The Transformer architecture consistently outperforms the baselines across the majority of species, demonstrating a superior ability to capture long\-range temporal dependencies in daily movement trajectories\. Most notably, the Transformer achieves the highest Balanced Accuracy and AUC for Elephants, Wildebeests, Lions, and Buffaloes\. While the TCN and 1D CNN models show competitive performance on specific metrics for the Caracal and Zebra, the Transformer maintains highly robust generalization, ensuring stable classification across distinct movement syndromes\.
TABLE IV:Performance comparison across species \(1\-hour, augmented features\)\.
### VI\-BEffect of Feature Augmentation
To measure the impact of feature engineering, we compared models utilizing a minimal displacement\-based feature set \(5 features\) against those utilizing our augmented feature set \(10 features\), which includes speed, bearing, and turning descriptors\. Both configurations were evaluated using 1\-hour resampled trajectories\.
As shown in Table[V](https://arxiv.org/html/2605.06726#S6.T5), incorporating kinematic and directional information yields substantial performance gains\. For example, the Balanced Accuracy for the Wildebeest classifier improves from 0\.70 to 0\.84, and the Lion classifier improves from 0\.59 to 0\.85\. These results confirm that minimal displacement data is often insufficient for distinguishing complex behavioral patterns\. Kinematic augmentation is crucial for learning species\-specific movement signatures, particularly in data\-scarce scenarios\.
TABLE V:Impact of feature augmentation across species \(1\-hour\)\.
### VI\-CEffect of Temporal Resolution
We also evaluated the sensitivity of the models to temporal sampling by comparing trajectories resampled at 1\-hour and 30\-minute intervals using the augmented feature set\.
Table[VI](https://arxiv.org/html/2605.06726#S6.T6)shows that temporal resolution affects classification performance across species\. Larger herd\-based species such as Elephant, Wildebeest, and Buffalo generally achieve higher performance at the 1\-hour resolution, which captures stable daily movement patterns while reducing short\-term positional noise\. In contrast, some species with more dynamic movement behaviors, such as Lions and Caracals, benefit from the finer 30\-minute resolution that preserves short\-term directional changes\.
Overall, however, the 1\-hour resampling strategy provides more consistent performance across the multi\-study dataset\. Because telemetry data in Movebank are collected at heterogeneous sampling frequencies, coarser resampling retains more usable trajectories during preprocessing and improves model robustness across studies\.
TABLE VI:Impact of temporal resolution on classification performance \(Augmented Features\)\.
### VI\-DConfusion Matrices and Final Performance
Figure[1](https://arxiv.org/html/2605.06726#S6.F1)visualizes the confusion matrices for the best\-performing Transformer configurations for each species \(utilizing their optimal temporal resolutions\)\. The matrices demonstrate high recall for the target species while maintaining strong specificity against the combined negative classes\. This high discrimination capability highlights the suitability of the Transformer architecture for deployment in real\-world wildlife monitoring and early\-warning systems\.
\(a\)Elephant \(1h\)
\(b\)Wildebeest \(1h\)
\(c\)Lion \(30m\)
\(d\)Buffalo \(1h\)
\(e\)Caracal \(30m\)
\(f\)Baboon \(1h\)
\(g\)Zebra \(30m\)
Figure 1:Confusion matrices for the best\-performing Transformer configurations across species\.
## VIIConclusion
This paper investigated whether the identity of wildlife species can be inferred from movement trajectories alone, using daily sequences of GPS coordinates collected in multiple regions and telemetry campaigns\. Using large\-scale African wildlife data from Movebank, we showed that transformer\-based sequence models outperform established sequential baselines model for wildlife species classification under cross\-study evaluation\. In the elephant binary classifier, the transformer achieved a balanced accuracy of 0\.83 and an AUC of 0\.92,with significant gains over the models based on LSTM, CNN, and TCN\.
We further demonstrated that augmenting simple displacement features with movement descriptors capturing speed, direction, and turning behavior markedly improves performance, and that temporal resolution plays a significant role, where a one\-hour temporal resampling rate generally provides more promising performance than the 30\-minute resampling rate under data with variable temporal resolution conditions\.Together, these results indicate that daily movement trajectories could encode discriminative movement patterns that remain informative across different regions and telemetry studies, and that attention\-based models are well\-suited to capturing them under realistic and cross\-region data constraints\.
## VIIIFuture Work
Several directions emerge from this study\.
- •Improved augmentation of movement feature sets:Building upon the existing movement feature sets to ascertain whether more movement descriptors improve the performance of the models or whether the performance becomes saturated\.
- •Multiday trajectory modeling:Building upon the existing one\-day movement trajectory modeling to include movement continuity over two or three consecutive days to help the models discover hidden constraints that may not be evident in one\-day trajectory modeling\.
- •Multiple species classification:Building upon binary classification to include multiple species classification of all considered species\.
- •Wider ecological coverage:Expanding the coverage of species and regions as open\-access telemetry data become available\.
## IXAcknowledgement
This publication was developed as part of the Center for Inclusive Digital Transformation of Africa \(CIDTA\), and, the Afretec Network which is managed by Carnegie Mellon University Africa and receives financial support from the Mastercard Foundation\. The views expressed in this document are solely those of authors and do not necessarily reflect those of the Carnegie Mellon University or the Mastercard Foundation\.
## References
- \[1\]T\. Avgar, J\. R\. Potts, M\. A\. Lewis, and M\. S\. Boyce\(2016\)Integrated step selection analysis: bridging the gap between resource selection and animal movement\.Methods Ecol\. Evol\.7\(5\),pp\. 619–630\.Note:doi: 10\.1111/2041\-210X\.12528Cited by:[§II\-A](https://arxiv.org/html/2605.06726#S2.SS1.p1.1)\.
- \[2\]S\. Bai, J\. Z\. Kolter, and V\. Koltun\(2018\-Mar\.\)An empirical evaluation of generic convolutional and recurrent networks for sequence modeling\.arXiv preprint\.Note:doi: 10\.48550/arXiv\.1803\.01271Cited by:[TABLE III](https://arxiv.org/html/2605.06726#S5.T3),[TABLE III](https://arxiv.org/html/2605.06726#S5.T3.3.2)\.
- \[3\]S\. Benhamou\(2004\-Jul\.\)How to reliably estimate the tortuosity of an animal’s path: straightness, sinuosity, or fractal dimension?\.J\. Theor\. Biol\.229\(2\),pp\. 209–220\.Note:doi: 10\.1016/j\.jtbi\.2004\.03\.016Cited by:[§IV\-B](https://arxiv.org/html/2605.06726#S4.SS2.SSS0.Px3.p1.3)\.
- \[4\]M\. Buda, A\. Maki, and M\. A\. Mazurowski\(2018\-Oct\.\)A systematic study of the class imbalance problem in convolutional neural networks\.Neural Netw\.106,pp\. 249–259\.Note:doi: 10\.1016/j\.neunet\.2018\.07\.011Cited by:[§V\-E](https://arxiv.org/html/2605.06726#S5.SS5.p2.1)\.
- \[5\]Z\. Che, S\. Purushotham, K\. Cho, D\. Sontag, and Y\. Liu\(2018\-Apr\.\)Recurrent neural networks for multivariate time series with missing values\.Sci\. Rep\.8\(1\),pp\. 6085\.Note:doi: 10\.1038/s41598\-018\-24271\-9Cited by:[§IV\-C](https://arxiv.org/html/2605.06726#S4.SS3.p4.1)\.
- \[6\]W\. Chen, Y\. Liang, Y\. Zhu, Y\. Chang, K\. Luo, H\. Wen, L\. Li, Y\. Yu, Q\. Wen, C\. Chen, K\. Zheng, Y\. Gao, X\. Zhou, and Y\. Zheng\(2024\-Mar\.\)Trajectory data management and mining: a survey from deep learning to the LLM era\.arXiv preprint\.Note:doi: 10\.48550/arXiv\.2403\.14151Cited by:[§II\-B](https://arxiv.org/html/2605.06726#S2.SS2.p1.1)\.
- \[7\]T\. Fawcett\(2006\)An introduction to ROC analysis\.Pattern Recognit\. Lett\.27\(8\),pp\. 861–874\.Note:doi: 10\.1016/j\.patrec\.2005\.10\.010Cited by:[§V\-E](https://arxiv.org/html/2605.06726#S5.SS5.p3.1)\.
- \[8\]P\. He, J\. A\. Klarevas\-Irby, D\. Papageorgiou, C\. Christensen, E\. D\. Strauss, and D\. R\. Farine\(2023\)A guide to sampling design for GPS\-based studies of animal societies\.Methods Ecol\. Evol\.14\(8\),pp\. 1887–1905\.Note:doi: 10\.1111/2041\-210X\.13999Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p3.1),[§III\-B](https://arxiv.org/html/2605.06726#S3.SS2.p1.1)\.
- \[9\]M\. Hebblewhite and D\. T\. Haydon\(2010\-Jul\.\)Distinguishing technology from biology: a critical review of the use of GPS telemetry data in ecology\.Philos\. Trans\. R\. Soc\. Lond\. B Biol\. Sci\.365\(1550\),pp\. 2303–2312\.Note:doi: 10\.1098/rstb\.2010\.0087Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p1.1)\.
- \[10\]S\. Hochreiter and J\. Schmidhuber\(1997\-Nov\.\)Long short\-term memory\.Neural Comput\.9\(8\),pp\. 1735–1780\.Note:doi: 10\.1162/neco\.1997\.9\.8\.1735Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p6.1)\.
- \[11\]R\. Kays, M\. C\. Crofoot, W\. Jetz, and M\. Wikelski\(2015\-Jun\.\)Terrestrial animal tracking as an eye on life and planet\.Science348\(6240\),pp\. aaa2478\.Note:doi: 10\.1126/science\.aaa2478Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p1.1),[§III\-A](https://arxiv.org/html/2605.06726#S3.SS1.p1.1)\.
- \[12\]R\. Kays, S\. C\. Davidson, M\. Berger, G\. Bohrer, W\. Fiedler, A\. Flack,et al\.\(2022\)The Movebank system for studying global animal movement and demography\.Methods Ecol\. Evol\.13\(2\),pp\. 419–431\.Note:doi: 10\.1111/2041\-210X\.13767Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p6.1),[§III\-A](https://arxiv.org/html/2605.06726#S3.SS1.p1.1)\.
- \[13\]S\. M\. Kazemi, R\. Goel, S\. Eghbali, J\. Ramanan, J\. Sahota, S\. Thakur, S\. Wu, C\. Smyth, P\. Poupart, and M\. Brubaker\(2019\-Jul\.\)Time2Vec: learning a vector representation of time\.arXiv preprint\.Note:doi: 10\.48550/arXiv\.1907\.05321Cited by:[§III\-D](https://arxiv.org/html/2605.06726#S3.SS4.p2.6)\.
- \[14\]I\. Loshchilov and F\. Hutter\(2019\-05\)Decoupled weight decay regularization\.InProceedings of the 7th International Conference on Learning Representations,Note:\[Online\]\. Available: https://openreview\.net/forum?id=Bkg6RiCqY7Cited by:[§V\-D](https://arxiv.org/html/2605.06726#S5.SS4.p2.3)\.
- \[15\]H\. Meyer and E\. Pebesma\(2022\-Apr\.\)Machine learning\-based global maps of ecological variables and the challenge of assessing them\.Nat\. Commun\.13\(1\),pp\. 2208\.Note:doi: 10\.1038/s41467\-022\-29838\-9Cited by:[§II\-A](https://arxiv.org/html/2605.06726#S2.SS1.p2.1)\.
- \[16\]C\. Molnar\(2022\)Interpretable machine learning: a guide for making black box models explainable\.2nd edition,Independently published\.Note:\[Online\]\. Available: https://christophm\.github\.io/interpretable\-ml\-book/Cited by:[§V\-E](https://arxiv.org/html/2605.06726#S5.SS5.p3.1)\.
- \[17\]V\. O\. Nams\(2014\-Oct\.\)Combining animal movements and behavioural data to detect behavioural states\.Ecol\. Lett\.17\(10\),pp\. 1228–1237\.Note:doi: 10\.1111/ele\.12328Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p4.1)\.
- \[18\]R\. Nathan, W\. M\. Getz, E\. Revilla, M\. Holyoak, R\. Kadmon, D\. Saltz, and P\. E\. Smouse\(2008\-Dec\.\)A movement ecology paradigm for unifying organismal movement research\.Proc\. Natl\. Acad\. Sci\. U\.S\.A\.105\(49\),pp\. 19052–19059\.Note:doi: 10\.1073/pnas\.0800375105Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p1.1),[§II\-A](https://arxiv.org/html/2605.06726#S2.SS1.p1.1)\.
- \[19\]T\. A\. Patterson, A\. Parton, R\. Langrock, P\. G\. Blackwell, L\. Thomas, and R\. King\(2017\)Statistical modelling of individual animal movement: an overview of key methods and a discussion of practical challenges\.AStA Adv\. Stat\. Anal\.101\(4\),pp\. 399–438\.Note:doi: 10\.1007/s10182\-017\-0302\-7Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p6.1),[§II\-A](https://arxiv.org/html/2605.06726#S2.SS1.p1.1)\.
- \[20\]T\. A\. Patterson, L\. Thomas, C\. Wilcox, O\. Ovaskainen, and J\. Matthiopoulos\(2008\-Feb\.\)State\-space models of individual animal movement\.Trends Ecol\. Evol\.23\(2\),pp\. 87–94\.Note:doi: 10\.1016/j\.tree\.2007\.10\.009Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p4.1)\.
- \[21\]H\. Ren, M\. Pan, Y\. Li, X\. Zhou, and J\. Luo\(2020\)ST\-SiameseNet: spatio\-temporal Siamese networks for human mobility signature identification\.InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 1306–1315\.Note:doi: 10\.1145/3394486\.3403183Cited by:[§II\-B](https://arxiv.org/html/2605.06726#S2.SS2.p1.1)\.
- \[22\]D\. R\. Roberts, V\. Bahn, S\. Ciuti, M\. S\. Boyce, J\. Elith, G\. Guillera\-Arroita,et al\.\(2017\)Cross\-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure\.Ecography40\(8\),pp\. 913–929\.Note:doi: 10\.1111/ecog\.02881Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p1.1),[§I](https://arxiv.org/html/2605.06726#S1.p3.1),[§I](https://arxiv.org/html/2605.06726#S1.p4.1),[§II\-A](https://arxiv.org/html/2605.06726#S2.SS1.p2.1)\.
- \[23\]N\. Srivastava, G\. Hinton, A\. Krizhevsky, I\. Sutskever, and R\. Salakhutdinov\(2014\-Jun\.\)Dropout: a simple way to prevent neural networks from overfitting\.J\. Mach\. Learn\. Res\.15\(1\),pp\. 1929–1958\.Note:\[Online\]\. Available: http://jmlr\.org/papers/v15/srivastava14a\.htmlCited by:[§V\-D](https://arxiv.org/html/2605.06726#S5.SS4.p2.3)\.
- \[24\]M\. A\. Tucker, K\. Böhning\-Gaese, W\. F\. Fagan, J\. M\. Fryxell, B\. Van Moorter, S\. C\. Alberts,et al\.\(2018\-Jan\.\)Moving in the Anthropocene: global reductions in terrestrial mammalian movements\.Science359\(6374\),pp\. 466–469\.Note:doi: 10\.1126/science\.aam9712Cited by:[§IV\-B](https://arxiv.org/html/2605.06726#S4.SS2.SSS0.Px1.p1.1)\.
- \[25\]D\. Tuia, B\. Kellenberger, S\. Beery, B\. R\. Costelloe, S\. Zuffi, B\. Risse,et al\.\(2022\-Feb\.\)Perspectives in machine learning for wildlife conservation\.Nat\. Commun\.13\(1\),pp\. 792\.Note:doi: 10\.1038/s41467\-022\-27980\-yCited by:[§II\-B](https://arxiv.org/html/2605.06726#S2.SS2.p2.1)\.
- \[26\]R\. Valavi, J\. Elith, J\. J\. Lahoz\-Monfort, and G\. Guillera\-Arroita\(2019\-Feb\.\)blockCV: an R package for generating spatially or environmentally separated folds for k\-fold cross\-validation of species distribution models\.Methods Ecol\. Evol\.10\(2\),pp\. 225–232\.Note:doi: 10\.1111/2041\-210X\.13107Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p1.1),[§I](https://arxiv.org/html/2605.06726#S1.p4.1),[§II\-A](https://arxiv.org/html/2605.06726#S2.SS1.p2.1),[§III\-C](https://arxiv.org/html/2605.06726#S3.SS3.p1.1)\.
- \[27\]A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin\(2017\)Attention is all you need\.InAdvances in Neural Information Processing Systems,Vol\.30\.Note:\[Online\]\. Available: https://papers\.nips\.cc/paper/7181\-attention\-is\-all\-you\-needCited by:[§I](https://arxiv.org/html/2605.06726#S1.p6.1),[§II\-B](https://arxiv.org/html/2605.06726#S2.SS2.p1.1),[§V\-B](https://arxiv.org/html/2605.06726#S5.SS2.p1.2)\.
- \[28\]G\. Wang\(2019\)Machine learning for inferring animal behavior from location and movement data\.Ecol\. Inform\.49,pp\. 69–76\.Note:doi: 10\.1016/j\.ecoinf\.2018\.12\.002Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p4.1),[§II\-A](https://arxiv.org/html/2605.06726#S2.SS1.p2.1)\.
- \[29\]Z\. Wang, W\. Yan, and T\. Oates\(2017\)Time series classification from scratch with deep neural networks: a strong baseline\.In2017 International Joint Conference on Neural Networks \(IJCNN\),pp\. 1578–1585\.Note:doi: 10\.1109/IJCNN\.2017\.7966039Cited by:[TABLE III](https://arxiv.org/html/2605.06726#S5.T3),[TABLE III](https://arxiv.org/html/2605.06726#S5.T3.3.2)\.
- \[30\]S\. J\. Wenger and J\. D\. Olden\(2012\)Assessing transferability of ecological models: an underappreciated aspect of statistical validation\.Methods Ecol\. Evol\.3\(2\),pp\. 260–267\.Note:doi: 10\.1111/j\.2041\-210X\.2011\.00170\.xCited by:[§V\-E](https://arxiv.org/html/2605.06726#S5.SS5.p1.1)\.
- \[31\]K\. L\. Yates, P\. J\. Bouchet, M\. J\. Caley, K\. Mengersen, C\. F\. Randin, S\. Parnell,et al\.\(2018\-Oct\.\)Outstanding challenges in the transferability of ecological models\.Trends Ecol\. Evol\.33\(10\),pp\. 790–802\.Note:doi: 10\.1016/j\.tree\.2018\.08\.001Cited by:[§I](https://arxiv.org/html/2605.06726#S1.p1.1)\.
- \[32\]G\. Zerveas, S\. Jayaraman, D\. Patel, A\. Bhamidipaty, and C\. Eickhoff\(2021\)A transformer\-based framework for multivariate time series representation learning\.InProceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 2114–2124\.Note:doi: 10\.1145/3447548\.3467401Cited by:[§V\-B](https://arxiv.org/html/2605.06726#S5.SS2.p3.1)\.Similar Articles
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins
This paper evaluates encoder-only Transformer and LSTM models for streamflow prediction in ungauged basins using NOAA's National Water Model simulations. Results show LSTM outperforms Transformer, and incorporating downstream information significantly improves prediction skill across both architectures.
Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking
Humanoid-GPT is a GPT-style Transformer pre-trained on a billion-scale motion corpus, achieving zero-shot generalization for whole-body motion tracking across unseen motions and tasks.
When Multi-Sensor Fusion Fails to Generalize: Cattle Posture Classification Under Animal-Level and Temporal Distribution Shift
This paper evaluates the robustness of multi-sensor fusion for cattle posture classification under temporal distribution shift, finding that multimodal models suffer significant performance drops and that simpler single-sensor models generalize better, highlighting shortcut learning issues.
DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System
DT-Transformer is a foundation model trained on 57.1 million structured EHR entries from 1.7 million patients across 11 hospitals in the Mass General Brigham health system, achieving strong discrimination for next-event prediction across 896 disease categories.
Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories
This paper proposes a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning to forecast 24-month CDR-SB change from ADNI clinical and biomarker histories, achieving reduced MSE and improved correlation over baselines.