Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity
Summary
This paper proposes a reinforcement learning-driven adaptive sim-to-real alignment method for vibration-based bearing health monitoring, addressing data scarcity and heterogeneous fault-type gaps via proximal policy optimization.
View Cached Full Text
Cached at: 06/25/26, 05:07 AM
# Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity Source: [https://arxiv.org/abs/2606.24954](https://arxiv.org/abs/2606.24954) [View PDF](https://arxiv.org/pdf/2606.24954) > Abstract:Vibration\-based health monitoring of rotating machinery requires reliable fault diagnosis under operational data constraints, yet condition assessment remains challenged by structural scarcity of fault events and heterogeneous sim\-to\-real gaps in digital twin\-generated signals\. Each fault type generates impulses with distinct periodicity, amplitude modulation, and spectral character, making feature\-space discrepancies fundamentally heterogeneous across fault classes\. Existing domain adaptation methods apply a class\-agnostic global transformation that cannot close all fault\-specific gaps without distorting inter\-class separability, while uniform source\-target mixing introduces distributional noise into the data\-abundant Normal class\. These limitations stem from treating a sequential, state\-dependent alignment problem as a one\-shot optimization\. Each corrective transformation simultaneously reshapes all class distributions, creating state dependencies that static gradient descent cannot resolve\. We formulate feature alignment as a continuous\-action Markov decision process solved via Proximal Policy Optimization, where the learned policy issues fault\-type\-specific affine corrections responsive to the current feature\-space configuration, with a dual\-objective reward balancing gap minimization against separability preservation\. An asymmetry\-aware strategy reserves real data for the Normal class while augmenting fault classes with policy\-aligned simulated samples\. Validation across XJTU\-SY, CWRU, and a self\-built slewing bearing testbed confirms the dominant gain from reinforcement learning\-driven alignment, and cross\-equipment linear probing achieves 92\.8% without encoder retraining, demonstrating transferable monitoring capability\. ## Submission history From: Jinghan Wang \[[view email](https://arxiv.org/show-email/5e0b85bb/2606.24954)\] **\[v1\]**Tue, 23 Jun 2026 08:47:24 UTC \(2,236 KB\)
Similar Articles
Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning
This paper proposes an active learning framework to couple high-fidelity Modelica simulations with simpler surrogate models (SINDyC, FNN, GRU) for creating efficient digital twins of thermal energy distribution systems. The approach significantly reduces the number of simulation trajectories needed while maintaining predictive accuracy and enabling uncertainty quantification.
Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction
This paper proposes a quantum annealing enhanced Q-learning framework for remaining useful life prediction, using the D-Wave system to solve QUBO formulations for action selection. It outperforms classical and quantum baselines on NASA C-MAPSS and predictive maintenance datasets.
Scientific Machine Learning for Engine Health Management and Remaining Useful Life Prediction
This paper presents a multi-task scientific machine learning framework for turbine prognostics that jointly predicts engine health metrics and remaining useful life with quantified uncertainty, using a shared sequence encoder and task-specific heads.
Learning When to Act: Communication-Efficient Reinforcement Learning via Run-Time Assurance
This paper presents a framework (CARE) that jointly learns control inputs and communication-efficient timing decisions under a pointwise Lyapunov safety shield, achieving higher inter-sample intervals than classical methods on inverted pendulum, cart-pole, and planar quadrotor systems.
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
This paper proposes Hierarchical Advantage-Weighted Behavior Cloning (HABC) for fine-tuning Vision-Language-Action (VLA) policies using online reinforcement learning with sparse binary episode outcomes. HABC separates viability and efficiency objectives via adaptive critic heads and intervention-aware credit assignment, significantly improving success rates on contact-rich bimanual manipulation tasks.