Immersive Human Digital Twins for Industry 4.0

doi:https://doi.org/10.30844/I4SE.26.3.1

Immersive Human Digital Twins for Industry 4.0

Supporting adaptive human-centric production by integrating cognitive and physical states

Journal	Industry 4.0 Science
Issue	Volume 42, 2026, Edition 3, Pages 6-13
Open Access	https://doi.org/10.30844/I4SE.26.3.1

Bibliography

Share

Cite

Download

Abstract

The rapid advancement of immersive technologies has created new opportunities to transform human-machine collaboration in industry. This paper presents an immersive platform with a digital twin that combines both physical and cognitive characteristics of human dynamics. By integrating multimodal sensing, human biomechanics, and cognitive state into digital twin technology, the proposed system enhances operational safety and ensures better ergonomics. The main argument is that human digital twins are not only desirable but essential for next-generation industrial systems. We discuss the limitations of existing human modeling approaches, outline the conceptual foundations of human digital twins, and demonstrate their industrial relevance across safety, productivity, ergonomics and sustainability.

Keywords

adaptive production, artificial intelligence, human digital twin, human modeling, human-centric production, immersive digital twin, Industry 4.0, multimodal data, Simulation

Article

Industry 4.0 has significantly improved manufacturing and industrial operations through automation, artificial intelligence and smart sensing. The virtual replication of physical entities to create so-called digital twins (DTs) [1] has become a vital component of this transformation. However, the use cases of DTs have been limited, with a focus on machines, tools and processes.

Most industrial systems remain fundamentally reliant on humans for supervision, decision-making and manual adaptation in reaction to unforeseen events. Since existing DTs rarely capture human dynamics, cognitive states and variability, automation and simulation operate under the assumption of flawless execution by a human worker. Unfortunately, this is not always the case, as the human worker can suffer from cognitive load, stress or fatigue. To mitigate those issues, it is essential to extend the DT platform to the human.

Toward that goal, human digital twins can map the human paradigm into the digital world to achieve human-centric automation [2-3]. This paper introduces an immersive framework for human digital twins that can be adapted across industries. We map both the physical and cognitive characteristics of human dynamics in an immersive platform capable of greater automation and simulation tasks.

Limitations of current human modeling in industry

Despite the growing recognition of human-centric tasks in Industry 4.0, most human modeling is designed for offline analysis and post-task evaluation. As a result, it fails to capture the real-time continuous interaction between human operators, machines and the environment. With limited support for decision-making, personalization and predictive risk assessment, these models are highly inefficient [4]. The following limitations motivate the development of the integrated, adaptive and predictive representation of the human operator presented in this paper.

Static and generic human models

Traditional ergonomic assessments rely on static anthropometric tables, rule-based posture analysis (e.g., RULA, REBA) or simplified biomechanical models. These approaches work on the basis of an imagined “average worker” and thus fail to capture interindividual variability, adaptation over time or transient physiological states.

Reactive rather than predictive safety

Most safety systems respond only after a certain threshold is crossed (e.g., excessive force, unsafe posture). Without predictive human models, early warning of fatigue, cognitive overload or injury risk remains limited.

Fragmented view of human state

Industrial monitoring often treats physical, cognitive and emotional factors separately. Wearable sensors, if used at all, are rarely integrated into a coherent model that links physiology, motion and task context.

Immersive human digital twin

Although debate continues regarding the minimum scope of data that each human digital twin should contain, researchers have agreed on a blueprint that defines the present and future of human digital twins. A five-stage roadmap is proposed that indicates the level of the human digital twin in operation [5].

*Figure 1: A roadmap of human digital twins at different levels based on infrastructure and operational capability.*

Figure 1 illustrates the adoption of this roadmap in compliance with our study. The initial physical entity layer portrays the human and the data collection medium. Physiological data is gathered in this layer, which serves as the input for a level 1 human digital twin. At this stage of human digital twin, providing in-depth data analytics is possible based on past data. Since this stage focuses on finding out which indicators trigger specific outcomes, a human digital twin system of this level is called a cross-sectional human digital twin.

With the output from the level 1 human digital twin and integrating past data, at level 2, it is possible to train a predictive model capable of forecasting future data points. Thus, it is called a deductive human digital twin. A level 3 human digital twin is then referred to as an editable human digital twin since the model capability at this stage reaches beyond simple simulation. In this phase, the human digital twin model can be run on modified data, allowing further human intervention and simulation of scenarios with unseen data.

This study only extends to the level 3 human digital twin. A jump above this level requires accounting for human lifestyle and greater environmental variables, such as dietary habits or living conditions. Given that evolutionary data is also taken into consideration for this, the model at this stage is called an evolutionary human digital twin.

Lastly, at level 5, the objective is to reach a white box or an explainable human digital twin. Unfortunately, the current technological infrastructure constrains the realization of an explainable human digital twin model. However, future technological advancements and more sophisticated input data are expected to enable the greater computational capacity required to reach level 5.

Multimodal data acquisition

One of the major challenges that distinguishes human digital twins from other digital twins is the integration of multimodal data originating from different sensor modalities [6-7]. Given the complexity of human dynamics, it is not possible to adequately map human-centric data with a single type of data source. Rather, fully capturing the complex relation of human data and contextual environmental factors requires a data stream from different modalities (kinematics, biometrics, etc.).

*Figure 2: Illustration of a human-centric data stream from different modalities and the corresponding data fusion stage.*

Figure 2 visualizes the different data modalities examined in this paper and the possible data fusion levels. The level of fusion depends on the integration technique [8]. This integration establishes the mechanism for synchronizing and fusing the heterogeneous data inputs into a single representation.

The different modalities of data in a human digital twin system typically start with metadata. Metadata generally represents historical data (disease, known conditions, etc.) about the human entity and basic characteristics (age, height, weight, etc.). Depending on the twin, human digital twins can include physiological, kinematic, environmental and behavioral data. Although the number of data modalities varies and depends on the twin’s purpose, all human digital twin systems should follow a proper data integration medium to make the system interconnected.

There are multiple computational approaches to achieving this, as explained in Figure 2, starting from a low-level fusion technique where raw data streams are merged immediately after pre-processing at an input level. At the intermediate fusion level, features from individual data modalities are extracted before fusing the data. This approach allows for the extraction of standalone features of individual modalities. Lastly, at a high level, different data from each modality are processed and trained separately. Utilizing deep learning architectures such as neural networks, transformer-based models [9-10], and ensemble learning methods has all demonstrated the capacity to capture the temporal dependency [11] and nonlinear relationship between heterogeneous data modalities.

By combining the data into a unified framework, human digital twins can theoretically overcome the limitations of isolated sensing systems. Once multimodal data integration is achieved, it becomes possible to implement the digital counterparts that reflect the dynamic, interconnected, and multidimensional nature of human beings. For our study, all the participants have given their informed consent as required, and an IRB approval for human-centric data collection is granted with approval No. 24-01-01 by the IRB of Saarland University on January 23, 2024.

System architecture of the human digital twin

*Figure 3: System setup for the human digital twin framework with different modalities of data stream.*

The system setup for the human digital twin involves data streams collected through multiple devices of different modalities (Fig. 3). The data collection pipeline consists of four primary categories of contact-based and contactless devices:

Wearable capacitive-based sensors, embedded within the Hexoskin Smart Vest for capturing physiological signals of the human body, including Electrocardiogram (ECG) and respiratory dynamics.
A pair of StretchSense gloves [12] to collect fine-grained biomechanical movements of human hands, including flexion, angle, and rotation of the fingers.
ANR MuscleSense EMG sensors for capturing musculoskeletal dynamics with muscle onset, offset, and intensity of activation.
A ZED 2i stereo camera as a vision-based MoCAP system for capturing full-body kinematics and environmental interactions.

These different categories of data are preprocessed and then integrated into a common platform for human digital twins. The framework developed for the human digital twin model follows a layered architecture consisting of multiple machine learning and deep learning algorithms. Our framework design architecture integrates multiple algorithms into a unified structure for predictive modeling, simulation, and real-time interaction.

The paper follows a modular approach, in which different algorithms within the framework address different aspects of the system: an ensemble learning method [13] for human biometric modeling, PINN for motion simulation [14-15], and a BiLSTM network [16-17] for fine motor control and cognitive state detection. The algorithms for the integration of the cognitive and physical state of humans are explained in previously published papers [13, 18].

Validation of the human digital twin

The human digital twin framework is assessed both at the individual algorithm level using multiple evaluation metrics for different sensor data modalities, and as a complete system as well, through qualitative evaluation with real users in an application-oriented study, as shown in our previous work [18]. A standard metric of Root Mean Square Error (RMSE) [19] is employed as a quantitative evaluation metric for all continuous prediction or regression tasks across all sensor modalities and algorithms. In addition to RMSE, each algorithm is also validated with specific validation metrics tailored to the respective sensor modality.

*Figure 4: Sensor modality performance metrics for different tasks.*

Figure 4 visualizes the individual RMSE score for different tasks containing data from different sensor modalities. However, the framework consists of tasks that cannot be accurately evaluated through RMSE and require additional metrics for better understanding. For biometric data evaluation, mean average error, mean square error, normalized mean square error, and scattered index (SI) [20] are implemented to better assess the model’s performance while dealing with time series data as presented in our previous study [13].

The human digital twin can be adopted for numerous application scenarios with use cases in both industry and the clinical domain. One use case is for cognitive state monitoring for human workers during industrial tasks to assist decision-making, which is extensively reported in our previous study [18]. The study highlights how the use of human digital twins can potentially enable informed decision-making, reduce cognitive overload and improve overall ergonomics for human workers. Despite its potential, the human digital twin framework is highly dependent on the integration of sensor data. The model’s performance can vary significantly depending on the modality used for data collection and the manner of data fusion.

Human digital twins as a paradigm shift

This study proposes a unified human digital twin framework solution that pushes current technological boundaries in the domain of human digital twins by integrating cutting-edge actuation, sensing, simulation, and bidirectional feedback. The approach includes integrating multimodal sensing, AI, and biomechanical simulation into one compact system. The system then provides real-time cognitive state monitoring and physical movement replication to address challenges of worker safety, task optimization, and human-machine interaction.

Thus, the human digital twin represents a paradigm shift from machine-centered to human-centered industrial digitalization. By capturing the dynamic, individual, and predictive aspects of human workers, human digital twins enable safer, more productive, and more sustainable industrial systems.

This is an original article. The German translation can be accessed via DOI: 10.30844/I4SD.26.3.1

Bibliography

[1] Tao, F.; Xiao, B.; Qi, Q.; Cheng, J.; Ji, P.: Digital twin modeling. In: Journal of Manufacturing Systems 64 (2022), pp. 372-389.
[2] Graessler, I.; Poehler, A.: Intelligent control of an assembly station by integration of a digital twin for employees into the decentralized control system. In: Procedia Manufacturing 24 (2018), pp. 185-189.
[3] Lin, Y.; Chen, L.; Ali, A.; Nugent, C.; Cleland, I. et al.: Human digital twin: A survey. In: Journal of Cloud Computing 13 (2024) 1, p. 131.
[4] Zhu, E.; Yang, S.: Towards human digital twin: Reviewing human modelling and simulation. In: Journal of Industrial Information Integration (2025), p. 100975.
[5] Tang, C.; Yi, W.; Occhipinti, E.; Dai, Y.; Gao, S. et al.: A roadmap for the development of human body digital twins. In: Nature Reviews Electrical Engineering 1 (2024) 3, pp. 199-207.
[6] Huang, Y.; Tao, J.; Sun, G.; Wu, T.; Yu, L. et al.: A novel digital twin approach based on deep multimodal information fusion for aero-engine fault diagnosis. In: Energy 270 (2023), p. 126894.
[7] Zhou, T.; Zhang, X.; Kang, B.; Chen, M.: Multimodal fusion recognition for digital twin. In: Digital Communications and Networks 10 (2024) 2, pp. 337-346.
[8] Blasch, E.; Pham, T.; Chong, C.-Y.; Koch, W.; Leung, H. et al.: Machine learning and artificial intelligence for sensor data fusion: Opportunities and challenges. In: IEEE Aerospace and Electronic Systems Magazine 36 (2021) 7, pp. 80-93.
[9] Rahali, A.; Akhloufi, M. A.: End-to-end transformer-based models in textual-based NLP. In: AI 4 (2023) 1, pp. 54-110.
[10] Dai, X.; Chalkidis, I.; Darkner, S.; Elliott, D.: Revisiting transformer-based models for long document classification. arXiv preprint (2022). URL: https://arxiv.org/abs/2204.06683, accessed 13.01.2026.
[11] Liu, D.; Wang, J.; Shang, S.; Han, P.: MSDR: Multi-step dependency relation networks for spatio-temporal forecasting. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022), pp. 1042-1050.
[12] Chowdhury, T. A.; Wagner, E.; Motzki, P.; Lehser, M.: Prognosis and predictive modeling of hand movements in industrial tasks using StretchSense glove and digital twin simulations. Digital Twins, AI, and NDE for Industry Applications and Energy Systems 2025 (2025), pp. 79-87.
[13] Chowdhury, T. A.; Gratz-Kelly, S.; Wagner, E.; Motzki, P.; Lehser, M.: Ensemble learning approach for advanced predictive modelling of biometric data and action states with smart sensing. In: IEEE Access (2024).
[14] Farea, A.; Yli-Harja, O.; Emmert-Streib, F.: Understanding physics-informed neural networks: Techniques, applications, trends, and challenges. In: AI 5 (2024) 3, pp. 1534-1557.
[15] Wang, X.; Antonion, K.; Raissi, M.; Joshie, L.: Machine learning through physics-informed neural networks: Progress and challenges. In: Academic Journal of Science and Technology 9 (2024) 1.
[16] Singla, P.; Duhan, M.; Saroha, S.: An ensemble method to forecast 24-hour ahead solar irradiance using wavelet decomposition and BiLSTM deep learning network. In: Earth Science Informatics 15 (2022) 1, pp. 291-306.
[17] Lin, H.; Zhang, S.; Li, Q.; Li, Y.; Li, J. et al.: A new method for heart rate prediction based on LSTM-BiLSTM-Att. In: Measurement 207 (2023), p. 112384.
[18] Chowdhury, T. A.; Gratz-Kelly, S.; Wagner, E.; Motzki, P.; Lehser, M.: A Multimodal Intelligent System for Human Digital Twin Simulation with Continuous Kinematic Data Tracking, Biometric Prognosis, and Cognitive State Feedback in Industrial Environments. In: Advanced Intelligent Discovery (2026).
[19] Hodson, T. O.: Root mean square error or mean absolute error: When to use them or not. In: Geoscientific Model Development Discussions (2022), pp. 1-10.
[20] Jadon, A.; Patil, A.; Jadon, S.: A comprehensive survey of regression-based loss functions for time series forecasting. arXiv preprint (2022). URL: https://arxiv.org/abs/2211.02989, accessed 13.01.2026.

Your downloads

Potentials: Innovation

You might also be interested in

Immersive Human Digital Twins for Industry 4.0

Supporting adaptive human-centric production by integrating cognitive and physical states

Abstract

Keywords

Article

Limitations of current human modeling in industry

Static and generic human models

Reactive rather than predictive safety

Fragmented view of human state

Immersive human digital twin

Multimodal data acquisition

System architecture of the human digital twin

Validation of the human digital twin

Human digital twins as a paradigm shift

Bibliography

Your downloads

You might also be interested in

Industry 4.0—Progress and Digitalization in Limbo

Application Potentials of Chinese Knowledge Platforms

Digital Twin Technology and Architecture

VR Training for Multimodal Cobot Interaction

Decentralized Coordination of AMRs

Developing Virtual Reality in Learning Contexts

Florian Goldmann