PDRNN: Modular Data-driven Pedestrian Dead Reckoning on Loosely Coupled Radio- and Inertial-Signalstreams
Summary
Proposes PDRNN, a modular hybrid AI-assisted pedestrian dead reckoning system that combines a recurrent neural network with separate ML models for orientation, velocity, and distance estimation, with optional radio-based stabilization. Experiments on dynamic sports movement data show superior accuracy and precision compared to classic and ML-based methods.
View Cached Full Text
Cached at: 05/18/26, 06:38 AM
# Modular Data-driven Pedestrian Dead Reckoning on Loosely Coupled Radio- and Inertial-Signalstreams This work has been carried out within the DARCII project, funding code 50NA2401, sponsored by the German Federal Ministry for Economic Affairs and Climate Action (BMWK) and supported by the German Aerospace Center (DLR), the Bundesnetzagentur (BNetzA), and the Federal Agency for Cartography and Geodesy (BKG). This work was also supported by the Bavarian Ministry for Economic Affairs, Infrastructure, Transport and Technology through the Center for Analytics Data Applications (ADA-Center) within the framework of “BAYERN DIGITAL II” (20-3410-2-9-8).
Source: [https://arxiv.org/html/2605.15252](https://arxiv.org/html/2605.15252)
###### Abstract
Modern pedestrian dead reckoning \(PDR\) systems rely on fusing noisy and biased estimates of position, velocity, and calibrated orientation derived from loosely coupled sensors to determine the current pose of a localized object\. However, discrepancies in the sampling rates of sensor\-specific estimation methods and unreliable transmission pose significant challenges\. And traditional methods often fail to effectively fuse multimodal sensor data during dynamic movements characterized by high accelerations, velocities, and rapidly varying orientations\.
To address these limitations, we propose a simple recurrent neural network \(RNN\) architecture capable of implicitly forecasting asynchronous sensor data streams from diverse estimation methods along reference trajectories\. The proposed approach introduces PDRNN, a modular hybrid AI\-assisted PDR system that handles each component as an independent ensemble of machine learning \(ML\) models to estimate both key parameter means and variances\. Separate ML\-based models are employed to estimate orientation, \(un\)directed velocity or distance from acceleration and gyroscope data, with optional absolute positioning from synchronized radio systems such as 5G for stabilization\. A final fusion model combines these outputs, position, velocity, and orientation, while using uncertainty estimates to enhance system robustness\. The modular design allows individual components to be updated, fine\-tuned, or replaced without affecting the entire system\. Experiments on dynamic sports movement data show that PDRNN achieves superior accuracy and precision compared to classic and ML\-based methods, effectively avoiding error accumulation common in black\-box approaches\. And PDRNN offers forecast capabilities and better component control despite increased system complexity\.
## IIntroduction
Traditional PDR methods face challenges with nondeterministic dynamic motion and sensor nonlinearity due to their reliance on fixed thresholds and user\-specific kinematic constraints, particularly in complex environments such as sports and virtual reality applications\. They rely heavily on predefined thresholds and kinematic constraints, rendering them insufficient for non\-deterministic movements and sensor nonlinearity\. Consequently, they struggle to adapt to real\-world scenarios, resulting in reduced accuracy and robustness\. Although recent ML\-based approaches such as RIDI\[[51](https://arxiv.org/html/2605.15252#bib.bib1126)\]and RoNIN\[[50](https://arxiv.org/html/2605.15252#bib.bib1127)\]offer more sophisticated sensor data modeling and achieve higher accuracy by modeling complex movements, but treat the entire process as an black box, thereby losing control over individual processing steps and directly propagating errors to the final position\. This monolithic structure sacrifices transparency and flexibility, making it challenging to control or fine\-tune individual components and leading to uncontrolled error accumulation in response to sensor uncertainties or environmental changes\. Consequently, classical methods tend to be overly general, employing static thresholds for generic kinematics, while modern ML\-based models are often too specific, being tailored to particular individuals or scenarios\. As a result, both approaches frequently fail, requiring significant effort to either recalibrate traditional methods or optimize and fine\-tune black\-box ML systems\.
Figure 1:Overview of PDRNN: input sensor data, estimation of position, velocity, and orientation \(without magnetometer\), as well as pose fusion\.Fig\.[1](https://arxiv.org/html/2605.15252#S1.F1)provides an overview of our approach, which addresses existing limitations by introducing a modular hybrid ML\-assisted PDR system, PDRNN\. PDRNN treats each component as an independent ensemble of ML\-based models, which can include any type of model, to estimate both the mean and variance of key parameters\. Specifically, separate ML models are employed for estimating orientation, velocity, and distance based on acceleration and gyroscope measurements, with the option to incorporate absolute positions from a time\-synchronized radio frequency \(RF\) system, such as 5G \(FR1\), for initializing or stabilizing the PDR system\. A final model then fuses the outputs, position, velocity, and orientation, to compute the final pose\. Additionally, each \(ensemble\) component generates an uncertainty estimate, enhancing the robustness of the fusion process and improving error resilience\. The modular design of PDRNN offers greater flexibility and control, enabling individual components to be updated, fine\-tuned, or replaced without affecting the entire system, leading to more accurate and robust position estimates\. Our experiments using sensor data from dynamic sports movements demonstrate that this modular ML\-assisted PDR system provides more accurate and precise pose estimations than classic and ML\-based methods and prevents error accumulation, a common issue in pure black\-box ML approaches\. Furthermore, the system’s flexibility makes PDRNN highly adaptable to measurement gaps and allows for continuous improvement without necessitating a complete system overhaul\.
Contributions\.This paper contributes to human motion estimation by integrating multi\-modal sensor data for accurate tracking and forecasting of pedestrian position in dynamic environments\. We introduce the PDRNN architecture, combining feed\-forward layers with RNN cells, enabling effective extraction of temporal features from sensor signals for accurate pose and trajectory estimation using inertial and RF measurements\. We address challenges of traditional methods, such as KF and model\-based PDR, which struggle with dynamic movements, sensor noise, and real\-time processing\. Through a data\-driven approach, PDRNN shows superior robustness and accuracy, particularly in sports applications with unpredictable motion\. We also demonstrate PDRNN’s ability to clean noisy sensor data and interpolate under\-sampled radio signals, ensuring reliable pose estimation\. PDRNN generalizes well across various movement patterns and user behaviors, enhancing its applicability in diverse contexts and laying the foundation for future advancements in forecasting\. Compared to state\-of\-the\-art methods, our approach offers higher accuracy \(up to 90%\), improved resilience \(CEP95\\text\{CEP\}\_\{95\}:PDR=1\.25m\\text\{PDR\}=1\.25\\,m,RoNIN=0\.46m\\text\{RoNIN\}=0\.46\\,m,PDRNN=0\.14m\\text\{PDRNN\}=0\.14\\,m\) and forecasting \(CEP95=0\.05m\\text\{CEP\}\_\{95\}=0\.05mat1s1\\,s\), with more control over system components, albeit at the cost of a more complex hybrid system\.
Outlook\.Section[II](https://arxiv.org/html/2605.15252#S2)offers a comprehensive review of related literature\. Section[III](https://arxiv.org/html/2605.15252#S3)describes the proposed methodology and preprocessing pipeline\. The experimental setup is detailed in Section[IV](https://arxiv.org/html/2605.15252#S4), followed by the presentation and discussion of results in Section[V](https://arxiv.org/html/2605.15252#S5)\. Finally, Section[VI](https://arxiv.org/html/2605.15252#S6)concludes the paper by summarizing key findings\.
## IIRelated Work
Section[II\-A](https://arxiv.org/html/2605.15252#S2.SS1)reviews work that utilize classic, Kalman or Particle filters for dead reckoning or the fusion of inertial measurements with radio position data\. Section[II\-B](https://arxiv.org/html/2605.15252#S2.SS2)presents studies that employ data\-driven techniques\.
### II\-ARecursive Probabilistic Methods
To enable radio\-based localization in\- and outdoors, particularly in situations where radio localization is restricted or unavailable, one potential solution is dead reckoning \(DR\), which combines inertial with magnetometer and radio sensor data\[[3](https://arxiv.org/html/2605.15252#bib.bib240)\]\. DR estimates the current position based on the previous position as soon as a position change is detected\[[27](https://arxiv.org/html/2605.15252#bib.bib261)\]\. Consequently, the current position may include errors from both systems, such as \(1\) inaccuracies in step lengths and velocities due to individual body height variations, \(2\) radio positions affected by multipath interference, and \(3\) orientation errors caused by ferromagnetic interference\[[27](https://arxiv.org/html/2605.15252#bib.bib261),[1](https://arxiv.org/html/2605.15252#bib.bib442),[44](https://arxiv.org/html/2605.15252#bib.bib447),[28](https://arxiv.org/html/2605.15252#bib.bib795)\]\. KFs are commonly used to merge radio and inertial sensor data\. Different variants of KF are typically employed\[[49](https://arxiv.org/html/2605.15252#bib.bib598),[30](https://arxiv.org/html/2605.15252#bib.bib567),[29](https://arxiv.org/html/2605.15252#bib.bib568),[5](https://arxiv.org/html/2605.15252#bib.bib555),[48](https://arxiv.org/html/2605.15252#bib.bib607)\]to account for the measurement noise of sensors\. The measurement covariance is often optimized based on the available training data, while the movement model is defined as broadly as possible for the specific application\. However, these factors result in the fusion process being highly application\-specific\. Even slight deviations in the sensors or the movement model lead to errors in the measurement and process covariance and significant estimation errors\.
Gusenbauer et al\.\[[22](https://arxiv.org/html/2605.15252#bib.bib450)\]employ a linear KF to merge GNSS, magnetometer, and acceleration sensor data, achieving an accuracy ofMAE=4\.2m\\text\{MAE\}=4\.2\\,m\. Zhuang et al\.\[[53](https://arxiv.org/html/2605.15252#bib.bib455)\]use an extended KF to merge WLAN and inertial sensor data, attaining an accuracy ofMAE=4\.19m\\text\{MAE\}=4\.19\\,mand demonstrating greater robustness against outliers compared to Li et al\.\[[34](https://arxiv.org/html/2605.15252#bib.bib456)\], who achieved an accuracy ofMAE=6\.2m\\text\{MAE\}=6\.2\\,m\. Tao et al\.\[[35](https://arxiv.org/html/2605.15252#bib.bib462)\]attribute the high inaccuracies of existing methods to the distortion of magnetometer data by nearby ferromagnetic metals, which compromise the orientation estimate and, in turn, reduce positional accuracy\. Sczyslo et al\.\[[46](https://arxiv.org/html/2605.15252#bib.bib445)\]also use a KF to trilaterate positions derived from time of arrival \(ToA\) measurements of an ultra\-wideband \(UWB\) localization system and acceleration measurements from an inertial sensor, achieving an accuracy ofMAE=0\.57m\\text\{MAE\}=0\.57\\,m\. The KF state vector consists of the UWB system’s positions, accelerations, and the integrated speeds of the inertial sensor\. Instead, Perttula et al\.\[[43](https://arxiv.org/html/2605.15252#bib.bib451)\]employ a PF to merge gyroscope and accelerometer data with ToA values from a UWB radio system, achieving different levels of accuracy depending on sensor placement:MAE=9\.9m\\text\{MAE\}=9\.9\\,mwhen attached to the torso andMAE=10\.3m\\text\{MAE\}=10\.3\\,mat the waist\. This variability suggests that even a PF capable of modeling nonlinear motion is subject to significant error variance\. Therefore, we utilize an optimized KF in our experiments to represent the state\-of\-the\-art in the most accurate manner\.
These probabilistic methods, that rely solely on the current and previous \(Markov\) states \(measurements and estimations\), are unable to detect data gaps or long\-term data anomalies and relationships\. And, the type and weighting of the fusion process are strictly predefined\. Instead, our data\-driven method, PDRNN, enables to learn these challenges directly and implicitly from the training data, resulting in significantly more accurate pose estimations\.
### II\-BData\-driven ML\-based Methods
While some studies estimate velocities using RNNs\[[27](https://arxiv.org/html/2605.15252#bib.bib261)\]or reinforcement learning\[[7](https://arxiv.org/html/2605.15252#bib.bib212)\]to reconstruct distance and uncalibrated trajectories, there are few methods that leverage multimodal information\[[38](https://arxiv.org/html/2605.15252#bib.bib1128),[39](https://arxiv.org/html/2605.15252#bib.bib1129),[37](https://arxiv.org/html/2605.15252#bib.bib1130)\], such as radio and inertial measurements, in data\-driven approaches\. Zyner et al\.\[[54](https://arxiv.org/html/2605.15252#bib.bib210)\]employ LSTM cells to predict future trajectories of a vehicle, using GPS positions, inertial measurement unit \(IMU\) orientation, and velocities from an odometry sensor to accurately reconstruct the trajectory in 90\.66% of cases\. However, the reference measurements are derived from the same data, making comparisons difficult\. In contrast, Yao et al\.\[[52](https://arxiv.org/html/2605.15252#bib.bib252)\]address sensor noise from signals such as GPS and IMU by utilizing a convolutional neural network \(CNN\) and a RNN to capture both relative movements from the IMU and global movements from the GPS system, while also modeling signal dysfunctions\. Their approach achieves a mean absolute error of 4\.043m\(SD=0\.524m\\text\{SD\}=0\.524\\,m\), compared to the mean absolute error of 6\.065m\(SD=0\.565m\\text\{SD\}=0\.565\\,m\) observed in traditional sensor data fusion methods\. As these preliminary studies focus on vehicle movements, it remains unclear whether and how their concepts apply to human motion\.
The central concept of our paper aligns with the approaches of Zyner et al\.\[[54](https://arxiv.org/html/2605.15252#bib.bib210)\]and Yao et al\.\[[52](https://arxiv.org/html/2605.15252#bib.bib252)\], but extends their methodologies to narrowband \(5G / FR1\) and UWB radio localization as well as 6DoF inertial sensors\. Unlike previous work, we conduct a comprehensive architecture and parameter study to identify the optimal light\-weight ML framework for these specific domains\. Additionally, we explore the transient behavior in highly dynamic scenarios and examine the impact of input data stream sequence length on the accuracy and robustness of data\-driven PDR\. Our approach builds upon these existing methodologies by introducing a modular and hybrid PDR system that integrates radio and inertial sensor data through independent ensembles of ML models\. Essentially, the literature review suggests that our paper represents one of the first publicly known studies to enable accurate and reliable detection, tracking, and fusion of human motion using data\-driven methods in highly dynamic scenarios, characterized by sudden changes in speed and direction, noisy inertial measurements from loosely placed sensors, and radio signals in multipath environments\.
## IIIMethodology
Section[III\-A](https://arxiv.org/html/2605.15252#S3.SS1)first outlines the problem, addressing data gaps, asynchronous data streams, and nondeterministic calibration\. Subsequently, Section[III\-B](https://arxiv.org/html/2605.15252#S3.SS2)presents our proposed methodology\. Section[III\-C](https://arxiv.org/html/2605.15252#S3.SS3)presents classical approaches and Section[III\-D](https://arxiv.org/html/2605.15252#S3.SS4)presents our ML\-driven method\.
### III\-AProblem Description
Dead reckoning forms the foundation for many modern navigation applications, including pedestrian, vehicle, and airplane navigation\. If the initial positionp0p\_\{0\}of an object is known, the subsequent absolute posesp1,p2,…,pnp\_\{1\},p\_\{2\},\\ldots,p\_\{n\}can be determined using orientation, velocityvv, accelerationaccacc, or distance covereddd\. The future absolute pose is predicted based on previous posespp\[[4](https://arxiv.org/html/2605.15252#bib.bib321)\]\. Environmental factors, such as sensor interference, temporary signal loss, attenuation, scattering, diffraction, refraction, reflection, drift, and sensor noise, lead to deviations in course or position, making fusion localization necessary for fail\-safe operation\[[25](https://arxiv.org/html/2605.15252#bib.bib226)\]\. IMUs correct localization system measurements, and vice versa\[[33](https://arxiv.org/html/2605.15252#bib.bib222)\], with IMUs regularly recalibrated using complementary localization technologies\[[33](https://arxiv.org/html/2605.15252#bib.bib222)\]\. As radio\-based localization is also susceptible to signal interference, inertial measurements help sustain dead reckoning, particularly during multipath propagation\[[33](https://arxiv.org/html/2605.15252#bib.bib222)\]\.
Data Gaps and Asynchronous Data Streams\.Existing fusion methods face several challenges\. Sensors produce measurements at varying data rates that cannot be processed by predefined filters, as model\-driven approaches require time synchronization between different sensors and uninterrupted data streams\. As most modern radio and inertial systems are loosely coupled, the radio position is determined by an external system and must be sent back to the object to be localized, or inertial measurements must be transmitted to the localization environment along with the emitted radio signals\. Both variants may introduce nondeterministic delays that hinder real\-time localization, as such delays can only be approximated by a rigid model\. In the worst case, data streams become asynchronous, and individual data points are lost\.
Nondeterministic Calibration\.It remains unclear when and to what extent a current position or orientation should optimally contribute to correction and calibration to rectify accumulated errors in an inertial measurement system or incorrect radio positions, thereby reconstructing poses and trajectories accurately\. Correcting the current pose estimate is only effective if the correction information \(such as positions or velocities\) are free of errors or more precise than the current pose\. Classic, manually\-designed methods often incorrectly determine these relationships\. For instance, errors in the inertial measurement system depend on application\-specific movement dynamics, where changes in speed and direction may lead to larger errors, while errors in the radio system are influenced by environment\-specific signal propagation properties\. Furthermore, constant calibration of the velocity estimate with a position does not always make sense, as the radio position typically introduces larger errors than the velocity estimate over short time intervals\. Thus, determining when to calibrate with a position to optimize pose accuracy and robustness is not straightforward\. It is also unclear how to best weight and update the position and orientation relative to the current velocity or step length estimate for calibration or correction\. A simplistic adjustment of the inertial sensor system’s current position estimate to the current radio position may result in significant position jumps\.
Figure 2:Data preprocessing pipeline\. Note that PDRNN implicitly fuses raw information streams \(dashed line at the bottom\)\.
### III\-BMethod Overview
Proposed Solution\.The core concept of our approach for precise and robust pose estimation is the integration of low\-frequency data\-driven position tracking \(using e\.g\., 5G / FR1 radio signals\) with high\-frequency data\-driven relative velocity estimation \(based on acceleration and gyroscope measurements from a smartphone\), as well as long\-term stable, low\-frequency data\-driven yaw orientation estimation \(also derived from acceleration and gyroscope measurements\)\. We propose a data\-driven methodology, PDRNN, capable of processing sensor data with varying data rates, non\-synchronized measurements, and variable time delays, while implicitly determining the optimal usage of absolute radio positions, \(un\)directed relative velocities, and \(optionally\) absolute orientations to predict an accurate absolute pose\. The key idea is to combine input streams, such as position, velocity, and orientation, and map these onto reference data streams, such as \(future\) reference positions, using a data\-driven model\. Our method leverages both Feedforward \(FF\) and LSTM cells to reduce capacity requirements, capture both short\- and long\-term spatio\-temporal correlations in the data, and implicitly identify and compensate for data gaps and other artifacts\. The data\-driven approach autonomously learns how and when correction information should be integrated into the fusion process to estimate an optimal pose\. Our modular PDRNN utilizes ensembles of ML models at each stage of the pipeline that are most appropriate for each task\.
Pipeline\.PDRNN learns to map specific local sensor data streams to corresponding tasks, including orientation, velocity / distance, position, and the final pose \(directed distance in absolute world coordinates\)\. The model outputs future 2D body poses, providing forecasts of human movement trajectories up to 2sahead\. It integrates radio and inertial sensors in a modular, data\-driven pipeline to estimate position, velocity, and orientation under dynamic movement conditions\. Especially, when the individual sensor data streams are loosely coupled and aggregated in real\-time, without the need for strict or reference\-based time synchronization\. As a result, PDRNN is designed to handle asynchronous or missing information\. Radio position measurements \(at 10 Hz\) experience a motion\-to\-photon delay of98−244ms98\-244\\,ms, while inertial sensors \(at 100 Hz\) exhibit delays ranging from5−13ms5\-13\\,ms\.
Data Preprocessing\.The data processing is performed for the individual input variables, including position\[[18](https://arxiv.org/html/2605.15252#bib.bib275)\], velocity\[[10](https://arxiv.org/html/2605.15252#bib.bib267)\], and orientation\[[14](https://arxiv.org/html/2605.15252#bib.bib288)\]\. For a fair comparison of PDRNN to state\-of\-the\-art methods, we have to preprocess the data\. However, note that PDRNN is running on direct data streams without further preprocessing, see the dashed line in the bottom of Fig\.[2](https://arxiv.org/html/2605.15252#S3.F2)\. For all other methods, data preprocessing involves time\-synchronizing the input data streams \(radio positionpradiop\_\{\\text\{radio\}\}from the radio systemradioradio, velocityvv, orientationθori\\theta\_\{\\text\{ori\}\}, reference poserefref, and accelerationaccacc\), cleaning them, and preparing them through resampling and interpolation before bundling the cleaned data streams\. These clean data bundles are further processed into segment \(ss\) and window \(ww\) bundles, as described by Feigl et al\.\[[10](https://arxiv.org/html/2605.15252#bib.bib267)\]\. Each bundle variant contains radio positionspradiop\_\{\\text\{radio\}\}, radio orientationθradio\\theta\_\{\\text\{radio\}\}\(representing the direction of movement between two consecutivepradiop\_\{\\text\{radio\}\}\), velocity estimatevv, orientation estimateθori\\theta\_\{\\text\{ori\}\}, reference poserefref\(withprefp\_\{\\text\{ref\}\},drefd\_\{\\text\{ref\}\},vrefv\_\{\\text\{ref\}\}, andθref\\theta\_\{\\text\{ref\}\}\), and raw 3D acceleration dataaccacc\. The orientationθori\\theta\_\{\\text\{ori\}\}is estimated based on the algorithm of Lavalle et al\.\[[31](https://arxiv.org/html/2605.15252#bib.bib777)\], and the current orientation is calibrated for all window bundles, which are classified using an ensemble of cubic support vector machines\[[14](https://arxiv.org/html/2605.15252#bib.bib288)\]\. The data streamspradiop\_\{\\text\{radio\}\},vv,θori\\theta\_\{\\text\{ori\}\}, and \(optionally\)accaccare merged and evaluated against the reference poserefref\. Segmentsssrepresent the clean, preprocessed data from each recording, such as that of an athlete with radio and inertial sensors\. The windowswwrepresent input sequences for pose estimation, which slide along the segmentss\. To process in real\-time, time synchronization occurs just before linear interpolation, where data are merged as they arrive and subsequently processed into segments or windows\.
\(Initial\) Position\(s\)\.The pipeline initiates with the real\-time aggregation of sensor data, where radio signals are captured by stationary antennas to estimate position using the Uplink TDoA method\. This method synchronizes the signal arrival times to determine the transmitter’s location through a tiny and simple sequence\-to\-sequence LSTM\-based learning algorithm\[[17](https://arxiv.org/html/2605.15252#bib.bib1)\], ensuring robust positioning, even during high\-speed activities with sudden stops and directional changes\.
\(Relative\) Velocities\.Concurrently, the modular ML\-assisted pipeline estimates velocities by utilizing sliding windows of acceleration and gyroscope data from inertial sensors, focusing on the relative magnitudes of the sensor’s motion independently of orientation\. A novel, compact, and efficient sequence\-to\-one ResNet\-based method\[[42](https://arxiv.org/html/2605.15252#bib.bib943),[11](https://arxiv.org/html/2605.15252#bib.bib3)\]ensures high performance even under dynamic conditions\.
\(Calibrated\) Orientation\.Simultaneously, a Madgwick filter\[[36](https://arxiv.org/html/2605.15252#bib.bib934)\]is employed to estimate the orientation by integrating acceleration, rotation rate, and magnetic field data\. To mitigate potential magnetic field interference, an additional calibration step adjusts the estimated orientation based on body movement direction classification and walking direction\[[15](https://arxiv.org/html/2605.15252#bib.bib4),[16](https://arxiv.org/html/2605.15252#bib.bib5)\]\. This step corrects any misalignment between body and head orientation by comparing consecutive positions and estimating a calibrated direction, thereby ensuring accurate alignment with the positioning coordinate system\.
\(Final\) Pose/Position\.The pipeline integrates position, velocity, and calibrated orientation data to generate a continuous pose estimate\. As position updates from radio signals are typically less frequent \(below 10 Hz\), the system interpolates the position between updates using velocity and orientation, thereby refining the position estimate until the next radio position\-based calibration\. PDRNN incorporates a robust fusion method that ensures high accuracy in dynamic environments, such as sports, where rapid changes in velocity, orientation, and acceleration are common\. The variable delays in the incoming sensor data streams are compensated for by PDRNN’s capability to forecast positions beyond250ms250\\,msand implicitly perform interpolation\.
### III\-CClassical Model\-based Approach
To reconstruct poses and trajectory, we usep0p\_\{0\}from theradioradio\-system\[[17](https://arxiv.org/html/2605.15252#bib.bib1)\]as the starting position and we transformθ\\thetaandρ\\rho, whereρ\\rho= distance derived fromvv,v→=1ms→d→=1m\\vec\{v\}=1\\frac\{m\}\{s\}\\rightarrow\\vec\{d\}=1\\,m, withdt=1\.0sdt=1\.0\\,sandθ∈\{θori,θradio,θref\}\\theta\\in\\\{\\theta\_\{\\text\{ori\}\},\\theta\_\{\\text\{radio\}\},\\theta\_\{\\text\{ref\}\}\\\}per window\):
p\(x\)=p0\(x\)\+ρ⋅cos\(θ\),\{p\(x\)\}=p\_\{0\}\(x\)\+\\rho\\cdot\\cos\(\\theta\),\(1\)p\(y\)=p0\(y\)\+ρ⋅sin\(θ\)\.\{p\(y\)\}=p\_\{0\}\(y\)\+\\rho\\cdot\\sin\(\\theta\)\.\(2\)If the orientationθori\\theta\_\{\\text\{ori\}\}is used for the calibration,θ\\thetais corrected accordinglyθ=θori\\theta=\\theta\_\{ori\}andp\(x\)p\(x\)andp\(y\)p\(y\)are adjusted by the newθ\\theta\. If the position is calibrated with a current radio positionpradiop\_\{\\text\{radio\}\},p\(x\)p\(x\)andp\(y\)p\(y\)are set accordingly:p\(x\)=pradio\(x\)\{p\(x\)\}=p\_\{\\text\{radio\}\}\(x\)andp\(y\)=pradio\(y\)\{p\(y\)\}=p\_\{\\text\{radio\}\}\(y\)\. For a fair comparison we employ the following models of Feigl et al\.\[[12](https://arxiv.org/html/2605.15252#bib.bib2),[11](https://arxiv.org/html/2605.15252#bib.bib3)\]: an optimized KF and PDR, ML\-GP, RoNIN, and C/RNN\. Note that these models only estimate \(un\)directed velocities\.
### III\-DMachine Learning\-based Approach
Figure 3:PDRNN architecture to fuse multimodal sensor signals\. FF layers process the input and output of the LSTM layer\. A dropout layer \(D\) after the LSTM layer reduces the possibility of overfitting and allows uncertainty estimation\.To derive an optimal variant of PDRNN that is tiny, fast, and energy efficient, the following recurrent cell architectures were evaluated: vanilla RNN\[[9](https://arxiv.org/html/2605.15252#bib.bib13)\], IRNN\[[32](https://arxiv.org/html/2605.15252#bib.bib20)\], LSTM\[[24](https://arxiv.org/html/2605.15252#bib.bib15)\], GRU\[[6](https://arxiv.org/html/2605.15252#bib.bib492)\], as well as cell concepts of Jozefowicz et al\.\[[26](https://arxiv.org/html/2605.15252#bib.bib33)\]and Greff et al\.\[[21](https://arxiv.org/html/2605.15252#bib.bib35)\]\. In addition to the cell architecture, the structure that connects and incorporates the cells is critical for optimizing the model architecture\. While standard RNN cells have been successfully applied to a variety of tasks, ranging from artificial addition tasks to music generation\[[24](https://arxiv.org/html/2605.15252#bib.bib15),[26](https://arxiv.org/html/2605.15252#bib.bib33),[32](https://arxiv.org/html/2605.15252#bib.bib20)\], many alternative networks have been proposed that use recurrent cells for more complex designs\. These networks include variations such as deep RNNs\[[23](https://arxiv.org/html/2605.15252#bib.bib879),[41](https://arxiv.org/html/2605.15252#bib.bib22)\]and specialized architectures such as bidirectional RNNs\[[45](https://arxiv.org/html/2605.15252#bib.bib62)\], encoder\-decoder networks\[[6](https://arxiv.org/html/2605.15252#bib.bib492),[47](https://arxiv.org/html/2605.15252#bib.bib61)\], and attention\-based models\[[2](https://arxiv.org/html/2605.15252#bib.bib8)\]\. We adapted these existing methods to pose estimation tasks\. Inspired by Pascanu et al\.\[[41](https://arxiv.org/html/2605.15252#bib.bib22),[40](https://arxiv.org/html/2605.15252#bib.bib335)\], we incorporate deep FF networks alongside stacked RNNs for the transition functions, to process both input and output, and FF layers are placed directly after the input layer and before the output layer\. This approach enables the network to perform more complex computations between timesteps, enhancing accuracy\. In contrast to Pascanu et al\.\[[41](https://arxiv.org/html/2605.15252#bib.bib22)\], we do not use depth connection functions, as they promote vanishing and exploding gradients, which complicate training\. The FF and RNN layers are stacked, with the outputs from one layer fed to the next in each processing step\. Particularly for tasks requiring extensive processing of input data, deeply stacked RNNs have outperformed single\-layer RNNs\[[19](https://arxiv.org/html/2605.15252#bib.bib908),[47](https://arxiv.org/html/2605.15252#bib.bib61)\]\.
Figure 4:Input data \(XX\) of the data\-driven pose estimator\. A windowwwwith a length of 128 timesteps slides over the segmentsswith the signal vectors\.The core concept of the PDRNN architecture is to utilize upstream and downstream FF layers to transform the input signalXX\(that may include radio positionpradiop\_\{\\text\{radio\}\}, velocityvv, accelerationaccacc, and orientationθori\\theta\_\{\\text\{ori\}\}\) from its high\-dimensional form into an optimal internal dimensionXintX\_\{\\text\{int\}\}\. The final architecture is depicted in Fig\.[3](https://arxiv.org/html/2605.15252#S3.F3)\. This internal representation is then processed by the RNN architecture, that optimally handles the representation and forwards it to the downstream FF layer\. The downstream FF layer subsequently transforms the RNN’s internal representationXint′X^\{\\prime\}\_\{\\text\{int\}\}into the target representation \(pose\)YYwith a different dimensionality\. The multimodal input domain, consisting ofpradiop\_\{\\text\{radio\}\},vv,accacc, andθori\\theta\_\{\\text\{ori\}\}, is thus mapped to a target domain \(pose\) through an RNN architecture that maximizes accuracy by combining upstream FF layers before LSTM cells and downstream FF layers\. A dropout layer placed between the LSTM cells and the downstream FF layer helps mitigate overfitting and enables the uncertainty of the method to be assessed\. Fig\.[4](https://arxiv.org/html/2605.15252#S3.F4)illustrates the structure of an exemplary input sequenceXX\. A sliding windowwwmoves over the segmentssdata\. Each window contains multidimensional vectors representing the input variables over, e\.g\., 128 timesteps\.
## IVExperiments
Section[IV\-A](https://arxiv.org/html/2605.15252#S4.SS1)describes the dataset\. Section[IV\-B](https://arxiv.org/html/2605.15252#S4.SS2)introduces the configuration of PDRNN\. Section[IV\-C](https://arxiv.org/html/2605.15252#S4.SS3)presents the configuration of all other \(state\-of\-the\-art\) methods\.
### IV\-ADataset
\(a\)Walking\.
\(b\)Jogging\.
\(c\)Running\.
\(d\)Random\.
\(e\)Walking\.
\(f\)Jogging\.
\(g\)Running\.
\(h\)Random\.
Figure 5:Exemplary reference trajectories for two segmentsss\.The dataset, proposed by Feigl et al\.\[[13](https://arxiv.org/html/2605.15252#bib.bib168)\], consists of motion data collected from 23 participants \(17 male, 6 female; average age 26\.7 years; height range:1\.46m1\.46\\,mto1\.87m1\.87\\,m\), using an optical reference system, Google smartphone IMUs, and 5G radio signaling\. The participants performed four activities: walking, jogging, running, and random movements\. Each activity lasted approximately 7\.5 minutes, beginning with a 1\-minute stationary period for sensor calibration and initial error estimation\. Data were collected with two sensors per participant, resulting in a total of about 59 hours of recordings and covering approximately107km107\\,km\. The dataset contains roughly 2,674,688 overlapping windows, enabling a comprehensive analysis of motion patterns and velocity\. Exemplary trajectories from the dataset are shown in Figure[5](https://arxiv.org/html/2605.15252#S4.F5)\.
We found that sliding windows of size 128 Hz with a 50% overlap \(Nw⋅0\.5=64N\_\{w\}\\cdot 0\.5=64\) provide excellent features with low computational costs, resulting in the highest accuracy for velocity estimation\. This configuration, with a window size of1\.28s1\.28\\,s, effectively captures long\-term relationships in human movement\. Additionally, we explore sliding windows in the range from0\.5s0\.5\\,sto30s30\\,s, with overlaps ranging from one timestep to 100% \(Nw⋅1\.0=128N\_\{w\}\\cdot 1\.0=128\)\.
### IV\-BParameterization of PDRNN
To identify the optimal architecture and parameters for the PDRNN method, a grid search was performed\.111PDRNN parameters \(best in bold\): Solver: SGD,Adam, Adadelta, RMSProp;β1=0\.01\\beta\_\{1\}=\\textbf\{0\.01\};β2=0\.01\\beta\_\{2\}=0\.01; Momentum:0\.9; FF, CNN, TCN, RNN, IRNN, GRU, LSTM, BLSTM layers: \[1, 2,3, 4, 8, 16\]; Cells per layer:120\[1:1:50, 50:10:500\]; Initial learning rate:0\.001\[1\.0:0\.1:0\.00001\]; LR reduction: Learning rate reduces everynnperiods, wheren∈\[0,10,50,100\]n\\in\[0,10,50,100\]epochs; LR reduction rate per period:0\.5; Batch size: \[128, 256, 512,1,024, 2,048\]; Regularization: \[L1L\_\{1\},L2L\_\{2\}, Huber loss, Log\-cosh loss\]; Parameter: \[100, 1,000, 10,000,100,000, 1,000,000\]; Shuffle: \[no,per epoch\]; Gradient clipping: max \(input\); Dropout layer: \[before, between,afterRNN layers\]; Dropout rate:0\.5∈\[0\.1,0\.7\]\\textbf\{0\.5\}\\in\[0\.1,0\.7\]; Each combination offsf\_\{s\}\(50,100, 200, 400\) andNwN\_\{w\}\(64,128, 256, 512\)\.Each parameter set was trained for a maximum of 100 epochs, with early stopping\. The resulting best combination of FF\+LSTM\+FF \(with 120 LSTM cells\) provides the most accurate pose estimates with short inference times\.
### IV\-CParameterization of Other Methods
Position Estimation\.We employ the same parametrization and architecture of the radio\-based positioning model of Feigl et al\.\[[20](https://arxiv.org/html/2605.15252#bib.bib827)\]\. Their position estimator yields radio positions with an error of MAE=0\.1731mm\(SD = 0\.031mm\) for V2 and an error of MAE=0\.1477mm\(SD = 0\.028mm\) for V3\.
Orientation Estimation and Calibration\.We employ the implementation of the orientation estimation of Lavalle et al\.\[[31](https://arxiv.org/html/2605.15252#bib.bib777)\]\. Their orientation estimation yields orientations with an error of MAE=1\.87°\(SD = 0\.96°\) for V2 and an error of MAE=1\.41°\(SD = 0\.78°\) for V3\. To calibrate the orientation, we employ the same parametrization and architecture of Feigl et al\.\[[15](https://arxiv.org/html/2605.15252#bib.bib4),[16](https://arxiv.org/html/2605.15252#bib.bib5)\]\. The calibration of the orientation using radio position estimates \(see above\) based on Feigl et al\.\[[20](https://arxiv.org/html/2605.15252#bib.bib827)\]with an ensemble of 100 cubic support vector machines, yields correct orientations for V2 in 96\.7 of 100 cases \(SD = 4\.01°\) and for V3 in 98\.3 of 100 cases \(SD = 3\.89°\)\.
Velocity Estimation\.In total, we evaluated four different velocity estimation methods: an optimized PDR, ML\-GP, RoNIN, and C/RNN\. Details of the parametrization and architecture of the models are listed by Feigl et al\.\[[12](https://arxiv.org/html/2605.15252#bib.bib2)\]\. The errors of these velocity estimation methods are given in Section[V](https://arxiv.org/html/2605.15252#S5)in Table[II](https://arxiv.org/html/2605.15252#S5.T2)\.
Pose Estimation\.A linear KF model\[[8](https://arxiv.org/html/2605.15252#bib.bib872)\]from the state\-of\-the\-art is employed\. Under the assumption of drift\-free signals, these models are considered optimal state estimators\. This Bayesian model characterizes the motion transition function \(motion model\) and the development of measurement and process noise as linear functions, influenced by Gaussian noise\[[13](https://arxiv.org/html/2605.15252#bib.bib168)\]\. To optimize the KF for the respective training data, it is initially configured with a starting state ofx0=0x\_\{0\}=0, covarianceP=1P=1, process noiseQ=0\.1Q=0\.1, measurement noiseR=σ=0\.1R=\\sigma=0\.1, and the transition function defined by a constant velocity\. Given the availability of empirical knowledge about the data \(i\.e\., the training dataset\), the KF is tailored to optimally suit the specific training data\. This configuration ensures that the KF model becomes an optimal estimator for the respective training set\. For this purpose, the KF is parameterized using the predictions ofpradiop\_\{\\text\{radio\}\},vv, and \(optionally\)accacc\. The KF’s measurement noise and process noise covariances are fine\-tuned based on the predictions ofpradiop\_\{\\text\{radio\}\},vv, and \(optional\)accacc\. Again, the errors of KF are given in Section[V](https://arxiv.org/html/2605.15252#S5)in Table[V](https://arxiv.org/html/2605.15252#S5.T5)\.
## VResults
### V\-AEffects of \(Un\)Directed Velocity
TABLE I:Statistics of the dataset for evaluating the pose estimates\.NameSubj\.Total \[\#\\\#\]Training \[\#\\\#\]Valid\. \[\#\\\#\]Test \[\#\\\#\]DurationDistancevrefv\_\{ref\}\[m/s\\mathrm\{m\}\\text\{/\}\\mathrm\{s\}\]\[\#\\\#\]Segment Window/ Feature Segment Window/ Feature Segment Window/ Feature Segment Window/ Feature \[minmin\]\[km/\\mathrm\{km\}\\text\{/\}\]∅\\varnothingminminmaxmaxAccuracyV120160112\.50011278\.7501611\.2503222\.5001\.203185\.402\.50\.87\.9V22016097\.95011268\.565169\.7953219\.5901\.045161\.422\.60\.87\.8V320\-31\.200\-21\.840\-3\.120\-6\.24066671\.91\.80\.83\.6TrajectoryUnknown Subj\.2169\.795\-\-\-\-169\.79510516\.142\.50\.73\.7
General Findings\.The general findings from the trajectory reconstruction are as follows: \(1\) No direct correlation was found between high velocity errors and high position errors for any of the methods\. For example, all methods exhibited high velocity errors during therandomactivity, yet the position errors were low for PDR, ML\-GP, and RoNIN, or nearly linear for C/RNN and PDRNN \(for a fair comparison, we trained PDRNN only on velocities\)\. \(2\) Across all activities, the PDRNN method produced the smallest positional errors, followed by C/RNN\. PDR yielded the worst results\. \(3\) Smaller position errors correspond to better agreement between the MAE,CEP95\\text\{CEP\}\_\{95\}, MSE, and RMSE values, reflecting the distance between estimated positions and the reference, even without recalibration\. In cases of significant outliers \(high MSE or RMSE values\), the reconstructed trajectories for PDR, ML\-GP, and RoNIN exhibited considerable distortion, with noticeable deviations in shape from the reference trajectory\. Instead, the trajectories for C/RNN and PDRNN showed better alignment, with smaller deviations from the reference\. \(4\) As the subject’s velocity increases, the reconstructed trajectories drift further from the reference\. Notably, all methods performed well on the more complex pattern of therandomactivity, with PDR, ML\-GP, and RoNIN achieving lower position errors than duringjoggingandrunning\. C/RNN and PDRNN showed superior performance\. \(5\) Recalibration significantly reduced position errors \(nearly halving them across all tests\)\.
Recalibration\.In a preliminary study we found that recalibrating the current position estimatep^\\hat\{p\}with the reference position improves the positional accuracy for all estimators across all error metrics\. Shorter calibration intervals led to more accurate position estimates, with1s1\\,sintervals being sufficient to alignp^\\hat\{p\}withprefp\_\{\\text\{ref\}\}\. Calibration intervals exceeding30s30\\,shad no significant effect on thefsf\_\{s\}methods, while even100s100\\,srecalibration intervals enhanced the reconstruction errors for the C/RNN and PDRNN \(trained on velocities only\) methods\. We select30s30\\,sintervals for the evaluation, as they mimic typical applications using GPS at a low update rate and reduce computational and communication load\.
TABLE II:Position error of all methods of the trajectories in \[mm\] of left out test subjects\. Highest position accuracies arebold\. ”pure”: a single initial position; ”recalibrated”: a single initial and a radio position every 30ss\.\(a\)PDR\.
\(b\)ML\-GP\.
\(c\)RoNIN\.
\(d\)C/RNN\.
\(e\)PDRNN\.
\(f\)PDR, recalibration\.
\(g\)ML\-GP, recalibration\.
\(h\)RoNIN, recalibration\.
\(i\)C/RNN, recalibration\.
\(j\)PDRNN, recalibration\.
Figure 6:Trajectory predictions of3min3\\,minfromwalkingof the left out test subject A \(x\- and y\-axes inmmand colored lines represent the estimates\. In line with Table 2, top row shows estimates of all methods using a single initial position ”pure” and the bottom row shows methods when we recalibrate them on a radio position regularly\. For a fair comparison we trained PDRNN without positions\.Effects of Velocity on the Trajectory Reconstruction\.All methods produce the most accurate trajectory reconstruction duringwalking\. However, even for this low\-velocity activity, PDR, ML\-GP, and RoNIN exhibit deviations in the reconstructed trajectories compared to the reference, while the PDRNN method \(trained on velocities only\) reconstructs the trajectory with almost no error\. As shown in Table[II](https://arxiv.org/html/2605.15252#S5.T2), the errors and outliers progressively decrease from PDR to PDRNN\. Larger outliers correlate with greater deviations from the reference trajectory\. As velocity increases, the general ranking of the methods’ reconstruction capabilities remains consistent, with PDR performing the worst and PDRNN yielding the best results\. The drift is more noticeable with PDR, ML\-GP, and RoNIN, compared to C/RNN and PDRNN, that demonstrate minimal drift\. Duringjogging, PDR, ML\-GP, and RoNIN exhibit drift, typically undershooting the reference \(estimated positions become smaller than the references\)\.
Effects of Recalibration on the Trajectory Reconstruction Accuracy\.It is evident that recalibration significantly reduces position errors across all metrics for all activities\. Generally, recalibration yields the greatest improvements when the velocity is low\. As shown in Table[II](https://arxiv.org/html/2605.15252#S5.T2)\(bottom row\), recalibration nearly halves the position errors for PDR, ML\-GP, and RoNIN, while its impact is less pronounced for C/RNN and PDRNN \(trained only on velocities\) methods\. Furthermore, PDR, ML\-GP, and RoNIN, that exhibit the most drift without recalibration, show the greatest improvements\. Although recalibration notably enhances trajectory reconstruction for PDR, ML\-GP, C/RNN, and PDRNN during therandomactivity, RoNIN still experiences significant position errors\. This may be due to RoNIN’s inability to sufficiently learn the dependencies between different movements and velocities, leading to the misprocessing of large outliers\. Given that C/RNN and PDRNN perform well with or without recalibration, it appears that ML\-based methods are more effective at handling a wide range of velocities\.
Conclusion\.As velocity increases, drift in the reconstructed trajectory \(only on velocity data\) becomes more pronounced, accompanied by position jumps, wobbling, and jitter, particularly with frequent changes in direction\. Consequently, the reconstruction of the trajectory performs best when velocity changes moderately, allowing for recovery from rapid movement and drift, while minimizing directional fluctuations\. By incorporating recalibrations, the estimated trajectories are brought closer to the reference, even for C/RNN and PDRNN, that already perform well without recalibration\. It appears that C/RNN and PDRNN excel in three key areas: \(1\) effectively handling varying velocities and the corresponding signal noise, as evidenced by consistently low error metrics across all activities; \(2\) accurately interpreting velocity scaling, that enables precise position estimates even during rapid motion changes; and \(3\) efficiently de\-noising the inputs, as reflected in their lower errors\.
### V\-BEvaluation of PDRNN
TABLE III:Reconstruction error \[mm\] of PDRNN on the unknown subjects with varying input data streams\.Figure 7:PDRNN pose estimation accuracy for various input data\. \(MAE = red line; outliers: red\+\+; and standard deviation\)\.Effect of Input Variations\.The input combination significantly impacts model accuracy\. Models incorporating both radio position \(pradiop\_\{\\text\{radio\}\}\) and velocity \(vv\) outperform those withoutvv\(e\.g\.,pradiop\_\{\\text\{radio\}\},accacc;pradio,θorip\_\{\\text\{radio\}\},\\theta\_\{\\text\{ori\}\};pradio,acc,θorip\_\{\\text\{radio\}\},acc,\\theta\_\{\\text\{ori\}\}\)\. Including acceleration \(accacc\) generally leads to poorer performance\. Even when acceleration is noiseless \(using a moving average filter,SMA=10\\text\{SMA\}=10\), the poses remain inaccurate\. The combination ofpradiop\_\{\\text\{radio\}\}andvvachieves the highest positional accuracy \(MAE=0\.0375\\text\{MAE\}=0\.0375,MSE=0\.0141\\text\{MSE\}=0\.0141,RMSE=0\.0027\\text\{RMSE\}=0\.0027,CEP95=0\.0991\\text\{CEP\}\_\{95\}=0\.0991inmm\)\. Instead, the combination ofpradiop\_\{\\text\{radio\}\}with the signal magnitude vector of acceleration \(pradio,accp\_\{\\text\{radio\}\},acc\) yields the worst results \(MAE=0\.2722\\text\{MAE\}=0\.2722,MSE=0\.0286\\text\{MSE\}=0\.0286,RMSE=0\.0421\\text\{RMSE\}=0\.0421,CEP95=0\.4356\\text\{CEP\}\_\{95\}=0\.4356\), and combining radio position, velocity, and acceleration \(pradio,v,accp\_\{\\text\{radio\}\},v,acc\) leads to worse results thanpradio,vp\_\{\\text\{radio\}\},valone \(MAE=0\.1144\\text\{MAE\}=0\.1144,MSE=0\.0243\\text\{MSE\}=0\.0243,RMSE=0\.0324\\text\{RMSE\}=0\.0324,CEP95=0\.2007\\text\{CEP\}\_\{95\}=0\.2007\)\. Adding acceleration contributes little beyond velocity and may even hinder accuracy, as no correlation exists between position and acceleration when velocity is absent\. The KF with a constant velocity motion model provides significantly more accurate poses than with a constant acceleration model, even whenaccaccis added\. The inaccuracy withaccaccmay stem from measurement noise, that the KF cannot reliably define for unknown data\. The KF, optimized forpradio,vp\_\{\\text\{radio\}\},v, achievesMAE=0\.3244\\text\{MAE\}=0\.3244,MSE=0\.0411\\text\{MSE\}=0\.0411,RMSE=0\.0573\\text\{RMSE\}=0\.0573, andCEP95=0\.6019\\text\{CEP\}\_\{95\}=0\.6019meters, though less accurate than PDRNN but more accurate than the model\-based PDR method \(MAE=0\.4900\\text\{MAE\}=0\.4900,MSE=0\.5200\\text\{MSE\}=0\.5200,RMSE=0\.6600\\text\{RMSE\}=0\.6600,CEP95=1\.2500\\text\{CEP\}\_\{95\}=1\.2500meters\)\. The KF performs best when optimized for the dataset but suffers higher errors with data from an unknown test subject due to dataset differences\. Adding orientation \(θori\\theta\_\{\\text\{ori\}\}\) to position generally worsens results, with the combination ofpradio,v,θorip\_\{\\text\{radio\}\},v,\\theta\_\{\\text\{ori\}\}yieldingMAE=0\.1377\\text\{MAE\}=0\.1377,MSE=0\.0212\\text\{MSE\}=0\.0212,RMSE=0\.0371\\text\{RMSE\}=0\.0371, andCEP95=0\.2289\\text\{CEP\}\_\{95\}=0\.2289meters\. This may be due to the radio and inertial sensors being in the same coordinate system, where the radio position implicitly indicates orientation, leading to estimation errors that confuse PDRNN\. Therefore, orientation should only be integrated into the model if its error variance is low and it does not degrade position accuracy\. However, when radio and inertial sensors are in different coordinate systems, orientation should be included\.
TABLE IV:Reconstruction errors of the PDRNN method of the trajectories \[m/\\mathrm\{m\}\\text\{/\}\] of the unknown test subjects with a varying forecast horizon\.Effect of the Forecast Horizon\.Figure[8](https://arxiv.org/html/2605.15252#S5.F8)shows pose accuracy in relation to the forecast horizon, with error variance presented for the best PDRNN model \(trained onpradiop\_\{\\text\{radio\}\},vv\) and KF\. The KF generally exhibits higher inaccuracies, tripling pose errors with a1s1\\,sprediction horizon\. As the KF uses only the last timestep to predict the next, the sequence length does not influence its accuracy\. This limited knowledge may explain why KF performs poorly for future pose predictions\. PDRNN, however, yields the most accurate predictions with a sequence length of 128 timesteps and a1s1\\,sforecast horizon, compensating for measurement and system delays \(1s1\\,sforecast with sequence length1\.28s1\.28\\,s:MAE=0\.0489\\text\{MAE\}=0\.0489,MSE=0\.0076\\text\{MSE\}=0\.0076,RMSE=0\.0057\\text\{RMSE\}=0\.0057,CEP95=0\.1074\\text\{CEP\}\_\{95\}=0\.1074meters\)\. Shorter sequences \(0\.64s0\.64\\,s\) are less accurate \(1s1\\,sforecast with0\.64s0\.64\\,ssequence:MAE=0\.0667\\text\{MAE\}=0\.0667,MSE=0\.0243\\text\{MSE\}=0\.0243,RMSE=0\.0324\\text\{RMSE\}=0\.0324,CEP95=0\.1319\\text\{CEP\}\_\{95\}=0\.1319meters\)\. Instead, long sequences \(e\.g\.,2\.56s2\.56\\,s\) negatively affect accuracy \(1s1\\,sforecast with2\.56s2\.56\\,ssequence:MAE=0\.1175\\text\{MAE\}=0\.1175,MSE=0\.0546\\text\{MSE\}=0\.0546,RMSE=0\.0975\\text\{RMSE\}=0\.0975,CEP95=0\.2356\\text\{CEP\}\_\{95\}=0\.2356meters\)\. Excess information from long sequences, such as circular and spiral motions, may cause PDRNN to focus on unnecessary details, leading to decreased accuracy\. We also observed that forecast horizons over2s2\\,syield significantly worse accuracy \(2s2\\,sforecast with1\.28s1\.28\\,ssequence:MAE=0\.3775\\text\{MAE\}=0\.3775,MSE=0\.1745\\text\{MSE\}=0\.1745,RMSE=0\.2793\\text\{RMSE\}=0\.2793,CEP95=0\.4745\\text\{CEP\}\_\{95\}=0\.4745meters\), and horizons over3s3\\,sresult in implausible pose accuracy, leading to the exclusion of tests with forecasts≥2s\\geq 2\\,s\. For forecast horizons<1s<1\\,s, error increases almost linearly with the horizon length\. In an independent experiment, long input sequences \(30s30\\,s\) of simple circular trajectories allowed accurate10s10\\,sfuture pose predictions\. Instead, short input sequences \(1\.28s1\.28\\,s\) produced larger errors for10s10\\,spredictions\. This may be because long sequences fully capture circular motions, while short sequences can only learn partial circles\. The lack of world knowledge about circular motions in shorter sequences prevents accurate long\-term predictions\. Notably, the LSTM’s context vector seems to provide enough capacity for learning simple circular motions over long trajectories, indicating a strong correlation between accuracy, sequence length, LSTM capacity, and forecast horizon\.
Figure 8:Pose accuracy of PDRNN \(inmm\) trained onpradiop\_\{\\text\{radio\}\},vvover the forecast horizon \(from0s0\\,sto3s3\\,s\) for different sequence lengths \(Nw∈\[0\.64,1\.28,2\.56\]sN\_\{w\}\\in\[0\.64,1\.28,2\.56\]\\,s\)\. The lines visualize the error \(MAE\) and the length of the vertical lines of the boxes represents the degree of error variance\.\(a\)Short sequence 1\.28s/\\mathrm\{s\}\\text\{/\}\.
\(b\)Long sequence 2\.56s/\\mathrm\{s\}\\text\{/\}\.
Figure 9:Effects of the sequence length \(blue line\) on generalizability\. Shorter sequences \(left\) are more likely to be contained in more complex trajectories and are more likely to reconstruct them\. Longer sequences \(right\) are often unique and therefore more difficult to generalize, i\.e\., the blue sequence \(right\) is only found once in the entire segment, so that the motion model cannot use this movement for the reconstruction\.Effect of Sequence Length\.The pose accuracy of PDRNN \(trained onpradiop\_\{\\text\{radio\}\},vv\) improves with sequence lengths up to 128 values, suggesting that the context vector approaches its maximum information capacity at this point \(sequence length = 128:MAE=0\.0375\\text\{MAE\}=0\.0375,MSE=0\.0141\\text\{MSE\}=0\.0141,RMSE=0\.0027\\text\{RMSE\}=0\.0027,CEP95=0\.0991\\text\{CEP\}\_\{95\}=0\.0991meters\)\. Beyond this length, accuracy declines, likely due to the context vector being overloaded \(sequence length = 256:MAE=0\.1012\\text\{MAE\}=0\.1012,MSE=0\.0537\\text\{MSE\}=0\.0537,RMSE=0\.0646\\text\{RMSE\}=0\.0646,CEP95=0\.1944\\text\{CEP\}\_\{95\}=0\.1944meters\)\. Shorter sequences with 64 values also yield poorer positional accuracy compared to 128 values \(sequence length = 64:MAE=0\.0615\\text\{MAE\}=0\.0615,MSE=0\.0034\\text\{MSE\}=0\.0034,RMSE=0\.0431\\text\{RMSE\}=0\.0431,CEP95=0\.1254\\text\{CEP\}\_\{95\}=0\.1254meters\)\. This may be attributed to insufficient information in the smaller window for the velocity estimate, leading to inaccuracies that negatively impact pose estimation\. Short sequences often capture simple short, curvy, and straight movements, whereas longer sequences include more complex motion patterns\. Longer trajectories may be reconstructed from combinations of short curves and straight lines, enabling models trained on shorter sequences to predict longer, complex motions accurately\. Conversely, trajectory shapes in longer sequences tend to be more specialized and restrictive, making it harder to reconstruct unfamiliar shapes \(sequence length = 512:MAE=0\.2641\\text\{MAE\}=0\.2641,MSE=0\.1123\\text\{MSE\}=0\.1123,RMSE=0\.1435\\text\{RMSE\}=0\.1435,CEP95=0\.3546\\text\{CEP\}\_\{95\}=0\.3546meters\)\. Figure[9](https://arxiv.org/html/2605.15252#S5.F9)illustrates examples of simple \(left\) and complex \(right\) motion patterns\.
\(a\)KF\.
\(b\)PDRNN\.
Figure 10:Positional accuracy of an example trajectory for the previously unseen subject A \(random movement\) with abrupt directional changes is depicted\. The predictions of the KF and PDRNN \(trained onpradiop\_\{\\text\{radio\}\},vv\) are compared during direction changes\. Initially \(x=15mx=15\\,m,y=17my=17\\,m\), the KF accurately models the position\. However, following a change in direction, significant errors appear in the KF predictions \(x=17mx=17\\,m,y=22\.5my=22\.5\\,m\), with recovery only after3\.8s3\.8\\,s\(x=24mx=24\\,m,y=19my=19\\,m\)\. Instead, PDRNN maintains error\-free predictions during directional changes\.TABLE V:Reconstruction error of the KF and PDRNN method of the trajectories in \[m/\\mathrm\{m\}\\text\{/\}\] of the unknown subjects\.Effect of Sudden Changes in Movement\.The models’ predictions for this dataset are evaluated by analyzing the MSE at each timestep in a window, providing insight into how errors evolve following sudden movement changes\. This analysis also enables visualization and assessment of the transient response\. As shown in Figure[10](https://arxiv.org/html/2605.15252#S5.F10), PDRNN \(trained onpradiop\_\{\\text\{radio\}\},vv\) settles much faster and achieves a lower MSE of0\.00072m0\.00072\\,mcompared to the optimized KF \(MSE=0\.0895m\\text\{MSE\}=0\.0895\\,m\) even for random movements\. Early experiments indicate that PDRNN maintains a low MSE \(MSE=0\.0088m\\text\{MSE\}=0\.0088\\,m\) and exhibits rapid convergence even when forecasting poses one second into the future\. Table[V](https://arxiv.org/html/2605.15252#S5.T5)summarizes the pose accuracies of KF and PDRNN for excluded subjects\. To evaluate the settling duration, ten random samples of sudden movement changes are analyzed, counting the timesteps required to return to the reference trajectory after deviations\. On average, PDRNN corrects forecast errors within0\.4s0\.4\\,s, while KF requires1\.8s1\.8\\,s\. Figure[12](https://arxiv.org/html/2605.15252#S5.F12)visualizes the pose estimates of KF \(left column, green\) and PDRNN \(right column, blue\) on data from left\-out test subject A \(activities include walking, jogging, running, and random movements\)\. While KF displays larger deviations from the reference trajectory as movement speed and directional changes increase, PDRNN consistently delivers precise and near\-identical poses for all activities\. Specific optimization for individual activities could potentially enhance their accuracy\. The results highlight the rigidity of the KF’s predefined motion model, that is insufficient for accommodating sudden and strong variations in measurement noise\.
\(a\)Training data\.
\(b\)Test data\.
Figure 11:Top\-down view of exemplary trajectory patterns in the training set \(left, approximately3min3\\,min\) and test set \(right, approximately3min3\\,min\) used to assess the generalizability of PDRNN across different motion patterns\. The training data, comprising activities such aswalking,jogging, andrunningfrom dataset V3, primarily feature circular trajectories\. Instead, the test data for therandomactivity from the excluded test subject A include more complex movement patterns\.Effect of Unknown Trajectory Shapes\.The KF handles trajectory estimation well, aside from occasional over\- and undershooting, as its motion model is based on the physical statespradiop\_\{\\text\{radio\}\},vv, andaccacc, rather than specific motion shapes\. Consequently, KF achieves a positional accuracy comparable to models explicitly optimized for random activities, extracting physical relationships directly from the data \(MAE=0\.4427\\text\{MAE\}=0\.4427,MSE=0\.2580\\text\{MSE\}=0\.2580,RMSE=0\.5001\\text\{RMSE\}=0\.5001,CEP95=0\.8331m\\text\{CEP\}\_\{95\}=0\.8331\\,m\)\. Instead, when PDRNN is trained solely on circular motions duringwalking,jogging, andrunning, it produces implausible results for random activities\. This failure arises as central positions within circular trajectories remain unknown, and many essential basic motion patterns are missing, see Figure[11](https://arxiv.org/html/2605.15252#S5.F11)\. These patterns are critical for reconstructing complex motions\. To address this limitation, PDRNN was retrained using all data from the three activities in dataset V3, that includes paths where athletes reverse directions or cross circular trajectories\. This improved model accurately estimates poses for random activities of excluded subjects, as illustrated in Figure[11](https://arxiv.org/html/2605.15252#S5.F11)\. These results demonstrate that training data must cover the fundamental building blocks of motion patterns to enable accurate reconstruction of unknown complex movements\. Shorter sequence lengths of 128 values yielded the best results \(MAE=0\.0417\\text\{MAE\}=0\.0417,MSE=0\.0041\\text\{MSE\}=0\.0041,RMSE=0\.0821\\text\{RMSE\}=0\.0821,CEP95=0\.0612m\\text\{CEP\}\_\{95\}=0\.0612\\,m\)\. Conversely, longer sequences of 256 values significantly degraded accuracy \(MAE=0\.3675\\text\{MAE\}=0\.3675,MSE=0\.2168\\text\{MSE\}=0\.2168,RMSE=0\.3273\\text\{RMSE\}=0\.3273,CEP95=0\.6578m\\text\{CEP\}\_\{95\}=0\.6578\\,m\)\. The poorer performance with longer sequences may stem from their inability to capture diverse random movements, that shorter sequences can better represent by incorporating curvy and straight components\. Longer sequences, that often model entire circular paths, lack the flexibility to generalize to arbitrary trajectory changes\. Figure[11](https://arxiv.org/html/2605.15252#S5.F11)depicts a reconstructed trajectory comprising numerous random movements\. PDRNN reconstructs highly congruent and precise trajectories, even when trained on less complex motion forms, see Figure[12](https://arxiv.org/html/2605.15252#S5.F12)\.
\(a\)KF,walking\.
\(b\)PDRNN,walking\.
\(c\)KF,jogging\.
\(d\)PDRNN,jogging\.
\(e\)KF,running\.
\(f\)PDRNN,running\.
\(g\)KF,random\.
\(h\)PDRNN,random\.
Figure 12:Trajectories predictions for3min3\\,minofwalking,jogging,runningandrandomof the left out test person A \(x\- and y\-axes inmm; black line: reference; colored line: estimates\)\.Impact of Data Gaps\.At least two radio positions are necessary to interpolate missing positions using velocities\. With a sequence length of1\.28s1\.28\\,s, the position update rate can be reduced from 100 Hz to 1 Hz, thereby decreasing the communication load by a factor of 100\. Initial experiments demonstrate that incorporating a single radio position per input sequence or window can yield accurate pose estimates if the directed distance \(delta\) between successive positions is integrated into the input rather than the absolute position\. In this approach, instead of predicting an explicit position, the model predicts the directed distance to the next position\. The estimated distance is then added to a starting position or each newly estimated position\. This technique achieved comparable accuracy to the method utilizing two radio positions per window, yielding results such asMAE=0\.0686\\text\{MAE\}=0\.0686,MSE=0\.0034\\text\{MSE\}=0\.0034,RMSE=0\.0066\\text\{RMSE\}=0\.0066, andCEP95=0\.1546m\\text\{CEP\}\_\{95\}=0\.1546\\,m\.
## VIConclusion
This paper introduces a significant advancement in the forecasting of human movement through a novel data\-driven method, PDRNN, which effectively integrates radio frequency and inertial sensor data in a PDR framework, utilizing modular ML components at each stage\. The architecture uniquely combines forward\-coupled layers with LSTM cells, enabling the extraction of spatio\-temporal features from sensor signals and their translation into meaningful motion data\. The results demonstrate the limitations of traditional approaches, such as KF and model\-based PDR methods, particularly in dynamic environments characterized by abrupt motion changes\. PDRNN outperforms monolithic ML\-based PDR techniques, such as RONIN, offering superior accuracy and efficiency\. It effectively handles sensor noise and data gaps, providing reliable pose and trajectory estimations\. PDRNN exhibits the ability to generalize across a range of activities and user behaviors, adapting seamlessly to diverse trajectories and velocities without significant loss of accuracy\. Key innovations include the model’s ability to generalize across various activities and user behaviors, enabling it to adapt to different movement patterns while maintaining high accuracy\. The model’s robust handling of sensor noise and data gaps ensures reliable performance under challenging conditions\. Exploiting positions and velocities as input enhances overall accuracy, improves real\-time processing capabilities, and facilitates the prediction of future poses, enabling compensation for system delays\.
Our experiments underscore the potential of PDRNN for precise pose estimation in dynamic environments\. The model addresses key challenges through a modular architecture that integrates prominent ML modules \(RoNIN, LSTM\) to effectively capture temporal relationships within sensor data while providing uncertainty measures\. The results demonstrate that PDRNN learns precise, adaptable, and generalizable motion patterns, outperforming established methods, particularly during sudden or random movement changes \(CEP95\\text\{CEP\}\_\{95\}:PDR=1\.25m\\text\{PDR\}=1\.25\\,m,RoNIN=0\.46m\\text\{RoNIN\}=0\.46\\,m,PDRNN=0\.14m\\text\{PDRNN\}=0\.14\\,m\)\. It exhibits shorter settling times and lower overall error rates under these conditions\. Notably, PDRNN predicts random trajectories up to one second into the future with errors below0\.12m0\.12\\,m\. Interestingly, increasing network depth did not yield significant accuracy gains, indicating that deeper layers were not critical to the model’s performance\. This highlights the efficiency and practicality of the proposed approach in managing complex human motion while maintaining high prediction and pose estimation accuracy close to the sensors\.
## References
- \[1\]R\. M\. Alexander\(1984\)Stride length and speed for adults, children, and fossil hominids\.American J\. of Physical Anthropology\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[2\]D\. Bahdanau, K\. Cho, and Y\. Bengio\(2015\)Neural machine translation by jointly learning to align and translate\.InProc\. Intl\. Conf\. Learning Representations \(ICLR\),pp\. 57–64\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[3\]S\. Bai, M\. Yan, Q\. Wan, L\. He, X\. Wang, and J\. Li\(2019\)DL\-RNN: an accurate indoor localization method via double RNNs\.IEEE Sensors J\.18\(6\),pp\. 1–1\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[4\]W\. Böhm\(1978\)Handbuch der Navigation: Begriffe, Formeln, Verfahren, Schemata\.Busse\.External Links:ISBN 978\-3\-87120\-323\-7Cited by:[§III\-A](https://arxiv.org/html/2605.15252#S3.SS1.p1.6)\.
- \[5\]P\. Chen, Y\. Kuang, and X\. Chen\(2017\)A uwb/improved PDR integration algorithm applied to dynamic indoor positioning for pedestrians\.Sensors J\.17\(9\),pp\. 2065\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[6\]K\. Cho, B\. Van Merriënboer, C\. Gulcehre, D\. Bahdanau, F\. Bougares, H\. Schwenk, and Y\. Bengio\(2014\)Learning phrase representations using rnn encoder\-decoder for statistical machine translation\.External Links:1406\.1078Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[7\]D\. Choi, T\. An, K\. Ahn, and J\. Choi\(2018\)Future trajectory prediction via RNN and maximum margin inverse reinforcement learning\.InProc\. Intl\. Conf\. Machine Learning and Applications \(ICMLA\),pp\. 125–130\.Cited by:[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p1.2)\.
- \[8\]H\. Coskun, F\. Achilles, R\. DiPietro, N\. Navab, and F\. Tombari\(2017\)Long short\-term memory Kalman Filters: Recurrent Neural estimators for pose regularization\.External Links:1708\.01885Cited by:[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p4.10)\.
- \[9\]J\. L\. Elman\(1990\)Finding structure in time\.Cognitive Science14\(2\),pp\. 179–211\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[10\]T\. Feigl, S\. Kram, P\. Woller, R\. H\. Siddiqui, M\. Philippsen, and C\. Mutschler\(2019\)A bidirectional LSTM for estimating dynamic human velocities from a single IMU\.Proc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\)8\(3\),pp\. 8\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p3.28)\.
- \[11\]T\. Feigl, S\. Kram, P\. Woller, R\. H\. Siddiqui, M\. Philippsen, and C\. Mutschler\(2019\)A Bidirectional LSTM for estimating dynamic human velocities from a single IMU\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation,pp\. 1–8\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p5.1),[§III\-C](https://arxiv.org/html/2605.15252#S3.SS3.p1.20)\.
- \[12\]T\. Feigl, S\. Kram, P\. Woller, R\. H\. Siddiqui, M\. Philippsen, and C\. Mutschler\(2020\)RNN\-aided human velocity estimation from a single imu\.Sensors J\.20\(13\),pp\. 3656–3690\.Cited by:[§III\-C](https://arxiv.org/html/2605.15252#S3.SS3.p1.20),[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p3.1)\.
- \[13\]T\. Feigl, S\. Kram, P\. Woller, R\. H\. Siddiqui, M\. Philippsen, and C\. Mutschler\(2020\)RNN\-aided human velocity estimation from a single imu\.Sensors J\.20\(13\),pp\. 3656–3690\.Cited by:[§IV\-A](https://arxiv.org/html/2605.15252#S4.SS1.p1.3),[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p4.10)\.
- \[14\]T\. Feigl, C\. Mutschler, and M\. Philippsen\(2018\)Head\-to\-body\-pose classification in no\-pose VR tracking systems\.InProc\. Intl\. Conf\. IEEE Virtual Reality and 3D User Interfaces \(IEEE VR\),pp\. 1–2\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p3.28)\.
- \[15\]T\. Feigl, C\. Mutschler, and M\. Philippsen\(2018\)Head\-to\-body\-pose classification in no\-pose VR tracking systems\.InProc\. Intl\. Conf\. IEEE Virtual Reality and 3D User Interfaces \(IEEE VR\),pp\. 1–2\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p6.1),[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p2.1)\.
- \[16\]T\. Feigl, C\. Mutschler, and M\. Philippsen\(2018\-09\-24/2018\-09\-27\)Supervised Learning for Yaw Orientation Estimation\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\),pp\. 1–8\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p6.1),[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p2.1)\.
- \[17\]T\. Feigl, T\. Nowak, M\. Philippsen, T\. Edelhäußer, and C\. Mutschler\(2018\)Recurrent neural networks on drifting time\-of\-flight measurements\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\),pp\. 206–212\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p4.1),[§III\-C](https://arxiv.org/html/2605.15252#S3.SS3.p1.9)\.
- \[18\]T\. Feigl, T\. Nowak, M\. Philippsen, T\. Edelhäusser, and C\. Mutschler\(2018\)Recurrent neural networks on drifting time\-of\-flight measurements\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\),pp\. 206–212\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p3.28)\.
- \[19\]T\. Feigl, T\. Nowak, M\. Philippsen, T\. Edelhäußer, and C\. Mutschler\(2018\)Recurrent Neural Networks on Drifting Time\-of\-Flight Measurements\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\),pp\. 1–8\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[20\]T\. Feigl, T\. Nowak, M\. Philippsen, T\. Edelhäußer, and C\. Mutschler\(2018\)Recurrent Neural Networks on Drifting Time\-of\-Flight Measurements\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\),pp\. 1–8\.Cited by:[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p1.4),[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p2.1)\.
- \[21\]K\. Greff, R\. K\. Srivastava, J\. Koutník, B\. R\. Steunebrink, and J\. Schmidhuber\(2017\)LSTM: A search space odyssey\.Trans\. on Neural Network Learning Syst\.28\(10\),pp\. 2222–2232\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[22\]D\. Gusenbauer, C\. Isert, and J\. Krösche\(2010\)Self\-contained indoor positioning on off\-the\-shelf mobile devices\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation,pp\. 1–9\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p2.6)\.
- \[23\]M\. Hermans and B\. Schrauwen\(2013\)Training and analysing deep Recurrent Neural networks\.InAdvances in Neural information Processing systems,pp\. 190–198\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[24\]S\. Hochreiter and J\. Schmidhuber\(1997\)Long short\-term memory\.Neural Computation9\(8\),pp\. 1735–1780\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[25\]C\. Jiang, Y\. Chen, S\. Chen, Y\. Bo, W\. Li, W\. Tian, and J\. Guo\(2019\)A mixed deep Recurrent neural network for MEMS gyroscope noise suppressing\.Electronics J\.8\(2\),pp\. 181\.Cited by:[§III\-A](https://arxiv.org/html/2605.15252#S3.SS1.p1.6)\.
- \[26\]R\. Józefowicz, W\. Zaremba, and I\. Sutskever\(2015\)An empirical exploration of Recurrent network architectures\.InProc\. Intl\. Conf\. Machine Learning \(ICML\),pp\. 2342–2350\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[27\]J\. Kang, J\. Lee, and D\. Eom\(2018\)Smartphone\-based traveled distance estimation using individual walking patterns for indoor localization\.Sensors J\.18\(9\),pp\. 3149\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1),[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p1.2)\.
- \[28\]W\. Kang, S\. Nam, Y\. Han, and S\. Lee\(2012\)Improved heading estimation for smartphone\-based indoor positioning systems\.InProc\. Intl\. Symp\. Personal, Indoor and Mobile Radio Commu\.,pp\. 2449–2453\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[29\]M\. Kok, J\. D\. Hol, and T\. B\. Schön\(2015\)Indoor positioning using ultrawideband and inertial measurements\.IEEE Trans\. on Vehicular Technology64\(4\),pp\. 1293–1303\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[30\]M\. Kok, J\. Hol, and T\. Schön\(2014\)An optimization\-based approach to human body motion capture using inertial sensors\.InIFAC,Cape Town, South Africa,pp\. 79–85\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[31\]S\. M\. LaValle, A\. Yershova, M\. Katsev, and M\. Antonov\(2014\)Head tracking for the oculus rift\.InProc\. Intl\. Conf\. Robotics and Automation \(ICRA\),pp\. 187–194\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p3.28),[§IV\-C](https://arxiv.org/html/2605.15252#S4.SS3.p2.1)\.
- \[32\]Q\. V\. Le, N\. Jaitly, and G\. E\. Hinton\(2015\)A simple way to initialize Recurrent networks of rectified linear units\.External Links:1504\.00941Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[33\]X\. Li, Y\. Wang, and K\. Khoshelham\(2018\)UWB/PDR tightly coupled navigation with robust extended Kalman Filter for NLOS environments\.Mobile Information Systems6\(5\),pp\. 1–14\.Cited by:[§III\-A](https://arxiv.org/html/2605.15252#S3.SS1.p1.6)\.
- \[34\]Y\. Li, P\. Zhang, X\. Niu, Y\. Zhuang, H\. Lan, and N\. El\-Sheimy\(2015\)Real\-time indoor navigation using smartphone sensors\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\),pp\. 1–10\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p2.6)\.
- \[35\]T\. Lin, L\. Li, and G\. Lachapelle\(2015\)Multiple sensors integration for pedestrian indoor navigation\.InProc\. Intl\. Conf\. Indoor Positioning and Indoor Navigation \(IPIN\),pp\. 1–9\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p2.6)\.
- \[36\]S\. Madgwick\(2010\)An efficient orientation Filter for inertial and inertial/magnetic sensor arrays\.Report x\-io and University of Bristol \(UK\)25\(3\),pp\. 113–118\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p6.1)\.
- \[37\]F\. Ott, T\. Feigl, C\. Löffler, and C\. Mutschler\(2020\-06\)ViPR: Visual\-Odometry\-aided Pose Regression for 6DoF Camera Localization\.InProc\. of the IEEE/CVF Intl\. Conf\. on Computer Vision and Pattern Recognition Workshops \(CVPRW\),Seattle, WA,pp\. 187–198\.External Links:[Document](https://dx.doi.org/10.1109/CVPRW50498.2020.00029)Cited by:[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p1.2)\.
- \[38\]F\. Ott, L\. Heublein, D\. Rügamer, B\. Bischl, and C\. Mutschler\(2024\-08\)Fusing Structure from Motion and Simulation\-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments\.InElsevier Journal of Visual Communication and Image Representation \(JVCIR\),Vol\.104256\.External Links:[Document](https://dx.doi.org/10.1016/j.jvcir.2024.104256)Cited by:[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p1.2)\.
- \[39\]F\. Ott, N\. L\. Raichur, D\. Rügamer, T\. Feigl, H\. Neumann, B\. Bischl, and C\. Mutschler\(2022\-08\)Benchmarking Visual\-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry\-aided Absolute Pose Regression\.InarXiv preprint arXiv:2208\.00919,External Links:[Document](https://dx.doi.org/10.48550/arXiv.2208.00919)Cited by:[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p1.2)\.
- \[40\]R\. Pascanu, C\. Gulcehre, K\. Cho, and Y\. Bengio\(2014\)How to construct deep Recurrent neural networks\.External Links:1312\.6026Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[41\]R\. Pascanu, Ç\. Gülçehre, K\. Cho, and Y\. Bengio\(2014\)How to construct deep Recurrent neural networks\.InProc\. Intl\. Conf\. Learning Representations \(ICLR\),pp\. 78–85\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[42\]D\. Peng, Z\. Liu, H\. Wang, Y\. Qin, and L\. Jia\(2018\)A novel deeper one\-dimensional cnn with residual learning for fault diagnosis of wheelset bearings in high\-speed trains\.Access J\.7,pp\. 1022–1029\.Cited by:[§III\-B](https://arxiv.org/html/2605.15252#S3.SS2.p5.1)\.
- \[43\]A\. Perttula, H\. Leppäkoski, M\. Kirkko\-Jaakkola, P\. Davidson, J\. Collin, and J\. Takala\(2014\)Distributed indoor positioning system with inertial measurements and map matching\.IEEE Trans\. on Instrumentation and Measurement63\(11\),pp\. 2682–2695\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p2.6)\.
- \[44\]V\. Renaudin, C\. Combettes, and F\. Peyret\(2014\)Quaternion based heading estimation with handheld MEMS in indoor environments\.InProc\. Intl\. Conf\. Position, Location and Navigation Symp\. \(PLANS\),pp\. 645–656\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[45\]M\. Schuster and K\. K\. Paliwal\(1997\)Bidirectional Recurrent Neural networks\.Trans\. on Signal Processing45\(11\),pp\. 2673–2681\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[46\]S\. Sczyslo, J\. Schroeder, S\. Galler, and T\. Kaiser\(2008\)Hybrid localization using UWB and inertial sensors\.InProc\. Intl\. Conf\. Ultra\-Wideband,pp\. 89–92\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p2.6)\.
- \[47\]I\. Sutskever, O\. Vinyals, and Q\. V\. Le\(2014\)Sequence to sequence learning with neural networks\.InProc\. Intl\. Conf\. Advances in Neural Information Processing Systems,pp\. 3104–3112\.Cited by:[§III\-D](https://arxiv.org/html/2605.15252#S3.SS4.p1.1)\.
- \[48\]J\. Wang, Y\. Gao, Z\. Li, X\. Meng, and C\. M\. Hancock\(2016\)A tightly\-coupled GPS/INS/UWB cooperative positioning sensors system supported by v2i communication\.Sensors J\.16\(7\),pp\. 944\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[49\]C\. Xu, J\. He, X\. Zhang, C\. Yao, and P\. Tseng\(2018\)Geometrical kinematic modeling on human motion using method of multi\-sensor fusion\.Information Fusion41\(1\),pp\. 243–254\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p1.1)\.
- \[50\]H\. Yan, S\. Herath, and Y\. Furukawa\(2019\)RoNIN: robust neural inertial navigation in the wild: benchmark, evaluations, and new methods\.External Links:1905\.12853Cited by:[§I](https://arxiv.org/html/2605.15252#S1.p1.1)\.
- \[51\]H\. Yan, Q\. Shan, and Y\. Furukawa\(2018\)RIDI: robust IMU double integration\.InECCV,V\. Ferrari, M\. Hebert, C\. Sminchisescu, and Y\. Weiss \(Eds\.\),External Links:[Document](https://dx.doi.org/10.1007/978-3-030-01261-8%5F38)Cited by:[§I](https://arxiv.org/html/2605.15252#S1.p1.1)\.
- \[52\]S\. Yao, S\. Hu, Y\. Zhao, A\. Zhang, and T\. Abdelzaher\(2017\)DeepSense: a unified deep learning framework for time\-series mobile sensing data Processing\.InProc\. Intl\. Conf\. World Wide Web \(WWW\),pp\. 351–360\.Cited by:[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p1.2),[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p2.1)\.
- \[53\]Y\. Zhuang and N\. El\-Sheimy\(2016\)Tightly\-coupled integration of WiFi and MEMS sensors on handheld devices for indoor Pedestrian navigation\.IEEE Sensors J\.16\(1\),pp\. 224–234\.Cited by:[§II\-A](https://arxiv.org/html/2605.15252#S2.SS1.p2.6)\.
- \[54\]A\. Zyner, S\. Worrall, J\. Ward, and E\. Nebot\(2017\)Long short term memory for driver intent prediction\.InProc\. Intl\. Conf\. Intelligent Vehicles Symp\. \(IV\),pp\. 1484–1489\.Cited by:[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p1.2),[§II\-B](https://arxiv.org/html/2605.15252#S2.SS2.p2.1)\.Similar Articles
Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty
This paper proposes a multi-agent reinforcement learning framework that co-trains an autonomous vehicle and pedestrians with personality-driven jaywalking behavior, achieving a 30% reduction in collisions compared to single-agent approaches and demonstrating more realistic interaction scenarios.
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
RAD-2 presents a unified generator-discriminator framework for autonomous driving that combines diffusion-based trajectory generation with RL-optimized reranking, achieving 56% collision rate reduction compared to diffusion-based planners. The approach introduces techniques like Temporally Consistent Group Relative Policy Optimization and BEV-Warp simulation environment for efficient large-scale training.
Multi-Pedestrian Safety Warning at Urban Intersections Use Case of Digital Twin
This paper presents a multi-pedestrian safety warning system at urban intersections using a digital twin framework, integrating camera, UWB, edge-cloud computing, and predictive trajectory modeling for real-time alerts. Results show high accuracy and reduced response times.
Parallel-in-Time Training of Recurrent Neural Networks for Dynamical Systems Reconstruction
This paper investigates parallel-in-time algorithms for training recurrent neural networks in dynamical systems reconstruction, proposing GTF-DEER that enables stable learning over long sequences and improves reconstruction accuracy.
Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving
Fast-dDrive is a block-diffusion VLA model for end-to-end autonomous driving that achieves state-of-the-art trajectory accuracy while delivering over 12x throughput speedup over autoregressive baselines, addressing the trade-off between high-fidelity planning and efficient inference for edge deployment.