Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity

arXiv cs.LG Papers

Summary

This paper proposes a reinforcement learning-driven adaptive sim-to-real alignment method for vibration-based bearing health monitoring, addressing data scarcity and heterogeneous fault-type gaps via proximal policy optimization.

arXiv:2606.24954v1 Announce Type: new Abstract: Vibration-based health monitoring of rotating machinery requires reliable fault diagnosis under operational data constraints, yet condition assessment remains challenged by structural scarcity of fault events and heterogeneous sim-to-real gaps in digital twin-generated signals. Each fault type generates impulses with distinct periodicity, amplitude modulation, and spectral character, making feature-space discrepancies fundamentally heterogeneous across fault classes. Existing domain adaptation methods apply a class-agnostic global transformation that cannot close all fault-specific gaps without distorting inter-class separability, while uniform source-target mixing introduces distributional noise into the data-abundant Normal class. These limitations stem from treating a sequential, state-dependent alignment problem as a one-shot optimization. Each corrective transformation simultaneously reshapes all class distributions, creating state dependencies that static gradient descent cannot resolve. We formulate feature alignment as a continuous-action Markov decision process solved via Proximal Policy Optimization, where the learned policy issues fault-type-specific affine corrections responsive to the current feature-space configuration, with a dual-objective reward balancing gap minimization against separability preservation. An asymmetry-aware strategy reserves real data for the Normal class while augmenting fault classes with policy-aligned simulated samples. Validation across XJTU-SY, CWRU, and a self-built slewing bearing testbed confirms the dominant gain from reinforcement learning-driven alignment, and cross-equipment linear probing achieves 92.8% without encoder retraining, demonstrating transferable monitoring capability.
Original Article
View Cached Full Text

Cached at: 06/25/26, 05:07 AM

# Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity
Source: [https://arxiv.org/abs/2606.24954](https://arxiv.org/abs/2606.24954)
[View PDF](https://arxiv.org/pdf/2606.24954)

> Abstract:Vibration\-based health monitoring of rotating machinery requires reliable fault diagnosis under operational data constraints, yet condition assessment remains challenged by structural scarcity of fault events and heterogeneous sim\-to\-real gaps in digital twin\-generated signals\. Each fault type generates impulses with distinct periodicity, amplitude modulation, and spectral character, making feature\-space discrepancies fundamentally heterogeneous across fault classes\. Existing domain adaptation methods apply a class\-agnostic global transformation that cannot close all fault\-specific gaps without distorting inter\-class separability, while uniform source\-target mixing introduces distributional noise into the data\-abundant Normal class\. These limitations stem from treating a sequential, state\-dependent alignment problem as a one\-shot optimization\. Each corrective transformation simultaneously reshapes all class distributions, creating state dependencies that static gradient descent cannot resolve\. We formulate feature alignment as a continuous\-action Markov decision process solved via Proximal Policy Optimization, where the learned policy issues fault\-type\-specific affine corrections responsive to the current feature\-space configuration, with a dual\-objective reward balancing gap minimization against separability preservation\. An asymmetry\-aware strategy reserves real data for the Normal class while augmenting fault classes with policy\-aligned simulated samples\. Validation across XJTU\-SY, CWRU, and a self\-built slewing bearing testbed confirms the dominant gain from reinforcement learning\-driven alignment, and cross\-equipment linear probing achieves 92\.8% without encoder retraining, demonstrating transferable monitoring capability\.

## Submission history

From: Jinghan Wang \[[view email](https://arxiv.org/show-email/5e0b85bb/2606.24954)\] **\[v1\]**Tue, 23 Jun 2026 08:47:24 UTC \(2,236 KB\)

Similar Articles

Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning

arXiv cs.LG

This paper proposes an active learning framework to couple high-fidelity Modelica simulations with simpler surrogate models (SINDyC, FNN, GRU) for creating efficient digital twins of thermal energy distribution systems. The approach significantly reduces the number of simulation trajectories needed while maintaining predictive accuracy and enabling uncertainty quantification.

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Hugging Face Daily Papers

This paper proposes Hierarchical Advantage-Weighted Behavior Cloning (HABC) for fine-tuning Vision-Language-Action (VLA) policies using online reinforcement learning with sparse binary episode outcomes. HABC separates viability and efficiency objectives via adaptive critic heads and intervention-aware credit assignment, significantly improving success rates on contact-rich bimanual manipulation tasks.