A methodology for synthetic generation of failure data for data-driven prognostics and health management modeling for digital twins

Abstract: Prognostics and Health Management (PHM) is one of the main services encompassed by Industry 4.0. However, the scarcity of failure data due to the nature of machines’ operation is still a challenge to be transposed in this field. Due to recent advances in computing power, simulation, sensing, and networking technologies digital twins allow us to adopt a different approach to this problem: inserting failures into a digital replica of the real asset to train data-driven PHM models. In this work, we propose a general methodology to generate and validate synthetic failure data for PHM purposes. Also, we present an application of the proposed methodology, which produced a synthetic failure dataset validated with real data. In the experiment, we have modeled a smart petroleum well in a commercial computational fluid-dynamics simulator and injected failures into the system by modifying the expected behavior of the equipment to generate synthetic failure data. Then, we assessed the quality of the synthetic data by training machine learning algorithms on them, testing on data from a petroleum plant production, and applying fidelity metrics to verify the necessary improvements to the process. The results show the feasibility of generating useful synthetic data for PHM purposes, and the proposed methodology indicates points of enhancement in the generated data. The presented methodology still has limitations concerning its extrapolation for the general PHM case, and this work also discuss alternatives to overcome these constraints.

Keywords: Synthetic Data, Digital Twins, Prognostics and Health Management (PHM) and Industry 4.0

Author: Rafael Schena
Advisor: Mara Abel

Dissertation: Master thesis