VR Training for Multimodal Cobot Interaction

Justus Langer

VR Training for Multimodal Cobot Interaction

Virtual learning environments for collaborative robots

Christoph S. Zoller, Justus Langer, Kristoffer Waldow

, Merle Meyer, Arnulph Fuhrmann

Journal	Industry 4.0 Science
Issue	Volume 42, 2026, Edition 3, Pages 106-112

Bibliography

Share

Cite

Download

Abstract

The VIRAMM research project is developing and prototyping a VR-based training concept for the integration of collaborative robots (cobots) in assembly-oriented U-cells. Since the benefits of cobots depend heavily on process, layout, and role integration, VIRAMM addresses the previously lacking consistent scenario design for variant comparisons with Key Performance Indicator (KPI)-based evaluation.

Keywords

collaborative robots, multimodal interaction, Scenario-Based Learning, Virtual Learning Factory, VR Training

Article

Collaborative robots (cobots) are increasingly being used in manual assembly processes because they complement human performance by taking over repetitive tasks and reducing ergonomic strain. Cobots are designed for collaboration in the immediate workspace of humans and can perform tasks without spatial separation, such as protective enclosures. However, their benefits depend significantly on process and layout integration, as well as the distribution of roles within the work system [1, 2].

In practice, the introduction of cobots often fails not because of hardware issues, but due to socio-technical implementation challenges, particularly in process design, fault management, and user acceptance. Process planners must justify automation decisions—such as which subtasks should be automated, at which stage of the process automation is beneficial, and how robustness and safety can be ensured compared to manual alternatives. Training and continuing education can accelerate the ramp-up process by imparting the skills needed for safe, efficient, and adaptable work in changing production systems [3].

Learning factories offer an established, process-oriented approach to acquire skills. However, they are often limited in terms of scenario diversity and comparability [4]. While virtual learning factories can already supplement or partially replace physical ones for certain training objectives, concrete evidence for training scenarios involving collaborative robots remains limited [6]. This is precisely where the VIRAMM research project contributes by providing VR-based training scenarios for assembly processes—including robot integration—to systematically investigate cobot-related qualification objectives.

Skills required for the use of cobots

In human-robot interaction, the use of cobots changes task profiles compared to traditional automation solutions, as cobots can be integrated into shared process chains with humans with minimal effort. Thus, cobots are increasingly deployed in dynamic and continuously changing assembly environments, where short implementation times are particularly advantageous [7].

Since the benefits of cobot integration often arise at gripping points, handover processes, and within buffer logic, the ability to make well-founded automation decisions is a key competency requirement [1, 8]. To acquire these competencies, it is therefore crucial to provide learning environments in which such decisions can be tested in practice and variants can be systematically compared. While real-world learning factories support this objective to a certain extent, they do not fully satisfy all related requirements.

Real-world learning factories enable action-oriented learning with a high degree of process relevance [9, 10]. Figure 1 shows a real assembly-oriented U-cell that forms part of the learning factory at the TH Köln Institute of Production. However, real learning factories are often limited in their suitability for systematic variant comparisons. Modifications, setup times, safety requirements, and resource expenditure make it difficult to quickly and repeatedly vary layouts, material provision, or process sequences, thereby reducing comparability between groups [4].

Figure 1: Real assembly-oriented U-cell in the learning factory as a physical reference.

Virtual production environments: Didactic potential and interaction requirements

Virtual learning factories can serve as a supplement to, or under certain conditions even a substitute for, real learning factories. In the context of this article, virtual reality is understood as a computer-generated, immersive (360° 3D) environment in which manipulation of virtual objects is possible in real time. These immersive, three-dimensional production environments can be experienced using VR headsets, allowing training scenarios to be recreated and depicted in various ways. This facilitates structured comparisons: Process and layout variants can be quickly adapted in virtual environments, so that effects can be consistently reflected based on predefined metrics.

This is particularly relevant for learning objectives focused on analysis, evaluation, and design, as these require multiple comparison variants and typically do not involve a single optimal solution [11].
Virtual approaches thus specifically address issues of scalability, comparability, and pedagogical control, creating the groundwork for leveraging virtual production environments and suitable forms of interaction to achieve training objectives [6]. This facilitates pedagogical possibilities that are only available to a limited extent in real-world environments [4].

Interaction within virtual production environments is pedagogically significant because learners not only observe processes but also actively plan, adjust, and execute them. Production-oriented interaction is particularly necessary when placing objects, handling materials, determining routes, and defining work sequences, as this makes decisions visible and tangible throughout the entire process [11].

At the same time, interaction concepts must be designed in a way that supports learning processes without creating additional cognitive load. Intuitive operating concepts and clear feedback are therefore necessary to avoid overwhelming learners with complex user interfaces.

This is where multimodality becomes relevant in the control of cobots. It supports scenario execution by reducing operational effort and directing the learner’s focus toward process decisions [10]. Virtual learning environments offer distinct advantages here, as different forms of interaction can be integrated with minimal implementation effort while simultaneously enabling a high degree of interactional diversity.
In VR scenarios, multimodality can support automation decisions in a practical way: Eye tracking facilitates the rapid targeting of objects or stations, gestures enable the spatial marking and placement of handover zones, gripping points, or material supply areas, and voice control supports discrete commands as well as parameter changes, such as for start and stop commands, variant changes, or the display of KPIs [8].

Figure 2 illustrates this concept using the example of multimodal preconfiguration of cobot tasks in VR. What matters here is not so much whether robots are used, but rather how tasks are sensibly distributed—that is, who takes on what, why this makes sense, and at what point in the process integration is appropriate—all while considering constraints such as variant diversity, cycle time, quality requirements, susceptibility to failure, as well as safety and ergonomic aspects. This decision-making logic links technical feasibility with production goals and human factors and requires a comprehensive understanding of the process [11, 13].

Figure 2: Multimodal pre-configuration of cobot tasks (voice/eye tracking) in VR.

For the comparison of variants, cycle time, throughput time, station-specific utilization, as well as waiting and idle times are specifically considered. These metrics were chosen because they directly highlight the effects of layout, task distribution, and handover synchronization within an assembly-oriented U-cell. Depending on the scenario, additional factors such as quality, ergonomics, user acceptance, as well as observations regarding team behavior, are included in the reflection. A particularly suitable reflection format is the comparison of multiple variants (e.g., with a focus on bottlenecks) as well as the comparison of virtual results with experiences gained from real learning factories to support knowledge transfer.

VIRAMM research project

Although VR learning environments are already used for process visualization, layout planning, and initial robotics-related training applications, the combination of the following four elements has so far only been described to a limited extent: first, an assembly-oriented U-cell as a consistent reference environment; second, a two-stage scenario design consisting of a manual baseline variant and a robot-assisted integration scenario; third, an explicit decision matrix for justifying cobot integration; and fourth, a KPI-based comparison methodology for evaluating different variants [5].

The VIRAMM research project addresses this gap by initially establishing a purely manual assembly cell within a VR environment and then expanding it with defined cobot integration options. These variants can be evaluated under standardized comparable variant conditions. This paper focuses on the conceptual derivation, the scenario-based VR design, and the current project status.

The goal of the research project is to develop a VR simulation environment that supports competency development for the integration of collaborative robots into assembly-oriented pilot cells. The novelty of this approach lies not merely in transferring a learning factory into VR, but rather in combining a physical reference U-cell with a virtual variant space, a two-stage scenario design for manual and robot-assisted process execution, a consistent decision matrix for cobot integration, and a KPI-based comparison and reflection methodology.

The primary focus is placed on decisions regarding the distribution of tasks between humans and robots, the design of handover processes, material provisioning, and process robustness. These decisions are comparatively evaluated and reflected upon using transparent criteria. Multimodal interaction, particularly through speech and gestures, supports the programming of robot activities, handover positions, and process steps. For example, the robot can supply materials, hold components, transfer workpieces with defined gripping positions, or perform individual assembly tasks such as screwing. This makes task distribution and interfaces explicitly visible while simultaneously enabling faster testing of process variants.

VIRAMM consists of two educational scenarios. In the first scenario, “Virtual Assembly & Optimization of a U-Cell,” a real U-cell is recreated in VR and used as a reference for purely manual assembly processes. In this scenario, layout, paths, material positions, and cycle times are systematically optimized. Figure 3 illustrates this virtual reference environment.

The second scenario, “Multimodal Robot Programming for U-Cell Assembly,” focuses on the multimodal preconfiguration of robot tasks and handover processes so that the robot behaves deterministically and reproducibly during subsequent executions. In this scenario, users can experience how different task distributions between humans and cobots influence cycle time, throughput time, and waiting times, as these indicators clearly illustrate the impact of handover processes, synchronization requirements, and layout decisions on process performance.

A subsequent reflection phase compares the results with the benchmark data of the physically implemented U-cell to make the model quality and the benefits of virtual planning transparent. Additionally, team behavior and workflow are observed, as communication, coordination, and a shared understanding of cycle times influence process performance.

Figure 3: Virtual U-cell as a reference environment in VIRAMM.

Decision-making is structured through a standardized framework that addresses the following questions: Who takes on what, why does this make sense? At which stage of the process should integration occur? Under which constraints does the process remain robust? [13]. The rationale is not solely technical but also aligned with key production metrics, such as productivity, quality, flexibility, and ergonomics, while explicitly considering the division of labor between humans and robots.

Constraints, such as product variety, typical disruptions, training assumptions, and necessary team coordination are taken incorporated to avoid purely idealized best-case solutions. The result is therefore not a single “correct” solution, but rather a well-founded decision that becomes comprehensible through the comparative analysis of cycle times, throughput times, capacity utilization, waiting and idle times, and through subsequent reflection.

The scenario logic is transferable to industrial training and continuing education, as it addresses recurring patterns such as material flow, cycle time, work organization, and variant management. The scenarios can be adapted modularly to different target groups, while VR-based training enhances comparability between groups and execution runs, allowing more consistent evaluation of qualification measures.

Several project-specific results have already been achieved in the project. These include the didactic development of a manual reference U-cell and a corresponding robot-assisted scenario, the definition of an adaptive learning and feedback concept incorporating structured triggers and reflection phases, and the prototypical technical implementation of an XR-based U-cell. This is based on the VIROO Enterprise XR platform, which serves as the technical foundation for the XR training scenarios within the Horizon Europe project MASTER XR.

In addition, the digital robot twin from the preceding OC1 project was integrated, and a multimodal interaction logic was prepared for the preconfiguration of robot activities using head gaze, gestures, and, in future stages, speech. The empirical evaluation of the learning outcomes and process effects will form the focus of the next project phase leading up to the project’s completion in June 2026.

This work was funded by Master XR Europe as part of the OC2 program, which is co-financed by the European Commission.

Bibliography

[1] Villani, V., Pini, F., Leali, F., Secchi, C., Survey on human–robot collaboration in industrial settings: Safety, intuitive interfaces and applications, Mechatronics 55 (2018)
[2] Haddadin, S., Croft, E., Erratum to: Physical Human–Robot Interaction, in: Springer Handbook of Robotics, Siciliano, B., Khatib, O. (eds.), Cham, 2016
[3] Andersson, S.K.L., Granlund, A., Bruch, J., Hedelind, M., Experienced Challenges When Implementing Collaborative Robot Applications in Assembly Operations, International Journal of Automation Technology 15 (2021) 5
[4] Rempel, W., Harkemper, L., Zoller, C.S., Analyse der Ausprägungen bestehender Lernfabriken – Virtuelle Realität als mögliche Antwort auf aktuelle Herausforderungen, Industrie 4.0 Management 38 (2022) 2
[5] Wolf, Matthias/Patrick Herstätter/Marvin Rantschl/Christian Ramsauer/Alejandra J. Magana: Immersive learning factories for promoting experiential manufacturing education and STEM competency development, in: Computers & Education X Reality, Bd. 7, 01.12.2025
[6] Zoller, C., Grzechca, B.A., Transformation von klassischen zu virtuellen Lernfabriken: Vergleichende Studie zu Lernerfolg und Lernmotivation in physischen und VR-gestützten Lernfabriken, Forschung und Innovation in der Hochschulbildung Nr. 24 (2025), Köln
[7] Romero, D., Stahre, J., Taisch, M., The Operator 4.0: Towards socially sustainable factories of the future, Computers & Industrial Engineering 139 (2020)
[8] Keshvarparast, A., Battini, D., Battaia, O., Pirayesh, A., Collaborative robots in manufacturing and assembly systems: literature review and future research agenda, Journal of Intelligent Manufacturing 35 (2024) 5
[9] Tisch, M., Metternich, J., Potentials and limits of learning factories in research, innovation transfer, education, and training, Procedia Manufacturing 9 (2017) Suppl C
[10] Tisch, M., Hertle, C., Abele, E., Metternich, J., Tenberg, R., Learning factory design: a competency-oriented approach integrating three design levels, International Journal of Computer Integrated Manufacturing 29 (2016)
[11] Radianti, J., Majchrzak, T. A., Fromm, J., Wohlgenannt, I., A systematic review of immersive virtual reality applications for higher education: Design elements, lessons learned, and research agenda, Computers & Education 147 (2020)
[12] Waldow, K., Kleinbeck, C., Fuhrmann, A., Roth, D., Investigating the Impact of Video Pass-Through Embodiment on Presence and Performance in Virtual Reality, IEEE Transactions on Visualization and Computer Graphics 31 (2025)
[13] Makransky, G., Petersen, G. B., The Cognitive Affective Model of Immersive Learning (CAMIL): a Theoretical Research-Based Model of Learning in Immersive Virtual Reality, Educational Psychology Review 33 (2021)

You might also be interested in

VR Training for Multimodal Cobot Interaction

Virtual learning environments for collaborative robots

Abstract

Keywords

Article

Skills required for the use of cobots

Virtual production environments: Didactic potential and interaction requirements

VIRAMM research project

Bibliography

You might also be interested in

Industry 4.0—Progress and Digitalization in Limbo

Application Potentials of Chinese Knowledge Platforms

Decentralized Coordination of AMRs

Immersive Human Digital Twins for Industry 4.0

Serious Games as a Training Tool

Malou Baumann