Data Quality and Domain Expertise for Resilient AI Deployment

Integrating anomaly and label error detection in industry

JournalIndustry 4.0 Science
Issue Volume 42, Edition 1, Pages 128-135
Open Accesshttps://doi.org/10.30844/I4SE.26.1.120
Share Cite Download

Abstract

AI implementation transforms work and worker-technology relationships in industrial quality control. This paper explores how approaches to data quality and model transparency support ethical AI deployment, fostering worker agency, trust, and sustainable work design in automatic surface inspection systems (ASIS). Recurring problems like data inefficiency, variable model confidence, and limited AI expertise point to key challenges of human-centered AI: user trust, agency and responsible data management. A solution co-developed with an ASIS supplier demonstrates that the challenges extend beyond the purely technical, underscoring the value of AI design that augments human capabilities. Technical solutions such as anomaly, label error, and domain drift detection are proposed to enhance data quality and model reliability. The insights emphasize the following generalizable strategies for resilient AI integration: understanding user-reported problems through a human-AI interaction lens, focusing on data quality transparency, providing intuitive reliability monitoring, and ensuring user-centric integration into existing workflows.

Article

AI offers significant opportunities in industrial quality control but also presents complex challenges along the human-AI interface, demanding a focus on applied ethics and human-centered design principles. In industrial settings, like those with automatic surface inspection systems (ASIS), machine learning is increasingly deployed for error detection tasks, such as steel rolling inspection. While this may promise increased efficiency and consistency, challenges concerning trust, usability, adaptability, and the role of human workers do arise as a result. Standard AI deployment often seems to neglect these dimensions, thus hindering adoption and raising ethical concerns. A human-centered approach to AI (HCAI) is therefore required.

This paper presents findings from a science-practice collaboration with an ASIS supplier. Core obstacles to the effective integration of ASIS into production lines were identified: data inefficiency (especially for rare errors), user distrust due to unclear model behavior and variable performance, limited user AI expertise, and information overload for operators. These practical challenges can be addressed by specific HCAI-aligned technical solutions focused on data quality, model transparency, and user empowerment. Co-developed solutions presented here—anomaly detection, label error detection, and drift monitoring concepts—can, however, enhance worker agency, trust, and sustainable AI adoption in industrial surface inspection.

This paper’s key contributions are: (1) identifying practical ASIS deployment challenges as ethical HCAI deficits, (2) mapping these challenges onto HCAI solutions, thereby (3) demonstrating how AI ethics can provide a framework for solving practical human-AI collaboration challenges in industry, and (4) presenting technical solutions for data quality awareness and reliability management.

Background

Effective and ethical AI integration into workplaces requires an understanding of both AI reliability and human-centered work design principles. Applied AI ethics concerns relate to fairness, accountability, transparency, and impact on work. HCAI provides a framework to address these challenges along three key dimensions [1, 2]:

  • technology development is concerned with ensuring consistent and robust performance (trustworthiness/reliability) and making outputs understandable (explainability) 
  • employee development focuses on empowering rather than replacing people (human agency & augmentation)and ensuring tolerable work conditions (sustainable work design) 
  • organizational development strives for ethical data governance by maintaining reliable organizational knowledge bases (responsible knowledge management/accountability) and integrating user domain knowledge, thereby enabling non-experts to manage AI quality. 

Neglecting these dimensions creates systems that are technically functional but practically unusable or untrustworthy for the end users [3].

On a technical level, AI performance depends on data quality. Common industrial application issues include label errors, outliers, and insufficient data for rare classes. Anomaly detection helps identify novel or rare events, which is critical for robust error detection. Label error detection improves dataset integrity and model evaluation reliability. Domain drift poses a major challenge to model performance, requiring monitoring to detect shifts in data distributions and responding to changing conditions such as lighting or material variations. Model confidence scores can be misleading, especially under drift, necessitating drift-aware trust calibration.

Prior work on this research project employed a collaborative, interdisciplinary approach involving academic researchers (computer science, engineering, social sciences, information systems), an industrial ASIS provider, and ASIS quality engineers to identify challenges to human-centered AI design. That work utilized interdisciplinary workshops with stakeholders (system developers, project managers), end-user feedback, and structured potential and socio-technical workflow analyses [4].

User needs were assessed through semi-structured interviews with quality engineers and line operators, documented via audio-recorded transcriptions. Interview transcripts were analyzed using systematic qualitative methods to derive design principles for ASIS interfaces and characterize user roles through personas [5]. Through these methods, relevant workflows, pain points (e.g., commissioning time pressure), and HCAI gaps were identified.

Method: Translating HCAI challenges into technical solutions

The authors of the present study, who participated in the original workshops and had access to the interview transcripts, collaboratively analyzed the documented pain points through discussions with the ASIS supplier and stakeholders. Through these iterative exchanges, recurring themes emerged. These were mapped onto the HCAI framework dimensions [1, 2], allowing the translation of identified needs into concrete technical requirements.

Technical approaches were selected for simplicity, comprehensibility, implementability, and model-agnostic application potential. Technical prototypes were developed based on proprietary industrial steel band datasets provided by the ASIS supplier. The datasets consisted of two labeled tabular error datasets comprising features derived from error images and steel band surface images for visual detection tasks. The prototypes were validated through internal testing with the ASIS supplier (using validation metrics and user tests) and customer feedback sessions [5, 6].

Case study: AI in Automatic Surface Inspection Systems (ASIS)

ASIS products are deployed in steel rolling to automatically assess steel band quality and flag possible low-quality bands. The system is first installed by a provider. Then, during commissioning, the AI model is adapted. It can also be retrained later for changing conditions or requirements. The typical workflow for the ASIS customer involves on-site image acquisition, manual error classification for training, real-time computer vision-based error detection, and AI classification [7]. There are three key users: line operators, who run production and require immediate feedback about current process quality; quality engineers, who monitor whether product quality meets requirements; and ASIS administrators, who manage and update data, maintain system results, and optimize performance when necessary.

Applying this methodology to AI in ASIS revealed specific HCAI challenges, presented in Figure 1.

Figure 1: Challenges for human-centered AI design, quality
Figure 1: Challenges for human-centered AI design.

To respond to these challenges, three technical solutions are proposed:

Solution 1: Anomaly and novel class detection on tabular data

Unsupervised anomaly detection methods were explored to address data scarcity for rare or novel errors using tabular data [8, 9]. Various algorithms were evaluated on the two industrial tabular datasets, including proximity-based, statistical, linear, and ensemble methods. The most promising approach identified was a combination of a proximity-based algorithm (Connectivity-based Outlier Factor (COF)) [10] and the entropy of predicted class probabilities. This method ranks the “unusualness” of detections relative to the model’s training data based on both sample feature space isolation and classifier confidence. 

Anomalies were simulated by omitting samples of rare classes from the dataset (appr. 3% of the data). The mean area under the receiver-operating characteristic (AUROC) was chosen as a metric for investigating the ranking quality and the system’s ability to discriminate between normal and anomalous samples. The combined approach using COF and entropy-based scores yielded the best overall performance across the two tested datasets with AUROCs of 80.36% and 83.14%. 

Given the imbalanced nature of error datasets and false positive rates of 37.58% and 35.77% at an 85% true positive rate, the proposed method still suffers from false alarms, but it identifies anomalies and novelties fairly robustly. By identifying anomalies, the system supports data quality awareness and can guide active learning workflows, prompting users to check whether unusual samples are mislabeled or belong to novel classes. This not only enhances user agency but also contributes to responsible knowledge management by reducing the likelihood that flawed or incomplete data becomes embedded in organizational knowledge bases.

Solution 2: Visual anomaly detection on image data

Complementary to the tabular approach, visual anomaly detection was implemented directly on steel band surface images to improve error localization using zero-shot and semi-supervised few-shot learning, eliminating the need for extensive error examples. AnomalyDINO [11], a patch-based deep nearest-neighbor approach using features from the pre-trained DINOv2 vision transformer, was adapted for the purpose. This method compares visual patches from test images against a memory bank of ‘good’ surface patches to identify deviations. For ASIS, reference patches contain only error-free steel band images. 

Internal evaluation on two steel band datasets demonstrated strong performance, achieving AUROC scores of 98.45% and 97.0-98.52%, with performance improving from 16 to 100 reference shots. Zero-shot analysis reached 99.13% AUROC through unsupervised anomaly detection and quantile optimization on a different proprietary dataset. However, the approach has difficulty detecting very small anomalies like oil stains or scratches, which may not sufficiently elevate anomaly scores, and can profit from further hyperparameter search.

Solution 3: Label error detection 

An algorithm was developed to improve dataset integrity and trust in model evaluations by identifying samples in the training and validation data that are likely to be labelled incorrectly. The Area Under the Margin (AUM) [12] ranking method for gradient-boosted decision tree models was adapted for the purpose [6]. The method calculates a score for each sample by averaging the margin between the predicted probability of its assigned class and the next most likely class. This takes place across all boosting steps during a single training run. 

AUM proved computationally efficient, requiring only one training run. On the two industrial tabular datasets, AUM achieved AUROC values of 97.1% and 96.2% for detecting 5% injected synthetic asymmetric label noise [13], comparable to more computationally expensive out-of-sample prediction methods [6]. In two real-world validation sessions with industry practitioners, 42% of the samples with the lowest AUM scores were confirmed as label errors, while others pointed to further data quality issues, validating the method’s practical utility. 

Presenting low-AUM samples to quality engineers thus enables targeted data label review and correction. This improves user agency and augmentation by allowing engineers to curate data more efficiently and with reduced manual burden, which in turn supports responsible knowledge management by reducing the risk that organizational knowledge bases are corrupted by mislabeled data. It also helps foster consensus on error classification criteria.

Figure 2: Integration of the solutions.
Figure 2: Integration of the solutions.

Integration outlook 

These solutions were integrated into existing ASIS software and workflows (Fig. 2). Anomaly and label error scores are displayed alongside error detections in tabular views used by quality engineers, allowing sorting and filtering to prioritize review tasks. Anomaly localization can be overlaid on steel band images or used for error localization. To make model reliability information accessible and minimize operator information overload, a simple “traffic light” indicator could be used, which would synthesize information from multiple sources (e.g. anomaly score distributions and model confidence levels) into a glanceable AI reliability assessment for real-time monitoring interfaces. This could be displayed in the operator’s real-time monitoring interface. 

Combining detailed scores for expert review with high-level indicators for operators supports informed decision-making across roles: the traffic light visualization reduces cognitive burden by lowering the volume and complexity of information that operators must process, which contributes to sustainable work design and shifts their focus to important events (augmentation). Meanwhile, detailed anomaly and label error scores enhance the trust of engineers and equip them for responsible knowledge management in data-driven decisions.

Discussion and conclusion

The co-developed prototypes, validated through lab simulations and real-world user feedback sessions, demonstrate that addressing data and model quality via an HCAI lens supports user agency and improves system resilience in industrial ASIS. User feedback from validation sessions confirmed the value of the label error detection solution. Ranking potential label errors clearly reduced the manual effort required for data quality assurance compared to non-targeted manual checks. This is demonstrated by the 42% identification rate based on label error ranking compared to regular label error prevalence rates (e.g., less than 5%) [6]. Data-efficient image-level anomaly detection proved effective at detecting surface defects on customer data.

Overall, technical solutions focused on data quality transparency are crucial for building trustworthy, resilient industrial AI. They bridge algorithmic capabilities and practical usability by empowering domain experts to understand, validate, and improve AI performance. In doing so, they directly enhance agency and augmentation (through more efficient workflows), trustworthiness (via transparent reliability monitoring), and responsible knowledge management (by improving data integrity). Strategies for ethical and effective HCAI are thereby shown to transfer beyond industrial ASIS, as outlined in Figure 3.

Figure 3: Transferable HCAI strategies.
Figure 3: Transferable HCAI strategies.

These strategies are relevant to domains requiring dependable AI with robust human oversight, such as safety-critical applications where data scarcity and reliability are an issue. By focusing on human-centered principles and co-developing technical solutions for data quality and reliability monitoring, this science-practice collaboration demonstrates how AI systems can become transparent, trustworthy collaborators that augment human capabilities. 

This research and development project is funded by the German Federal Ministry of Research, Technology and Space (BMFTR) within the “The Future of Value Creation – Research on Production, Services and Work” program (02L19C200) and managed by the Project Management Agency Karlsruhe (PTKA). The authors are responsible for the content of this publication.

Your downloads


Potentials: Dynamics

You might also be interested in

Industry 4.0—Progress and Digitalization in Limbo

Industry 4.0—Progress and Digitalization in Limbo

Status of sustainable transformation and digitalization in production engineering
Christian Donhauser ORCID Icon, Daniel Riepl
Digitalization projects help users represent complex processes more simply and efficiently. However, there are many obstacles to implementation. Reluctance to implement these projects is palpable. This affects, among others, employers and employees, who may fall behind economically by waiting or avoiding change. These observations can be traced back to an overarching research question: What barriers and systemic challenges hinder sustainable transformation within the context of Industry 4.0, particularly when considering human labor in production engineering? What questions are the affected stakeholders asking? The primary goal of this long-term research project is to define these questions decisively and in detail in order to develop a conceptual foundation that integrates research, teaching, and technological development and thus combines the potential of digital technologies with the experiential and practical knowledge of production workers.
Industry 4.0 Science | Volume 42 | 2026 | Edition 3 | Pages 56-60
Serious Gaming and the Energy Transition

Serious Gaming and the Energy Transition

Collaborative knowledge generation and interactive understanding of complex interrelationships
Janine Gondolf ORCID Icon, Gert Mehlmann, Jörn Hartung, Bernd Schweinshaut, Anne Bauer
Conveying the complexity and multifaceted nature of the energy transition to a broad audience is a challenge. This article demonstrates how interactive serious games on a multitouch table can help make connections tangible and comprehensible. The games and the table were used in various conversational contexts. These are presented here in three case vignettes based on participant observation of the different applications, as well as situated and shared reflection. The vignettes demonstrate how interaction can trigger epistemic processes, enable shifts in perspective, and foster collective thinking, all of which are necessary for shaping the future of society as a whole.
Industry 4.0 Science | Volume 42 | 2026 | Edition 2 | Pages 62-69
Industrial Transformation via a Machining Learning Factory

Industrial Transformation via a Machining Learning Factory

A learning module to foster competencies for a sustainability-driven transformation
Oskay Ozen ORCID Icon, Victoria Breidling ORCID Icon, Stefan Seyfried ORCID Icon, Matthias Weigold
Sustainability-enhancing transformation processes are necessary in all sectors if we are to remain within planetary boundaries. This also applies to the industrial sector as a significant emitter of greenhouse gases. Employees need new competencies to master this complex task of industrial transformation. These range from CO2 equivalents accounting to the development and evaluation of transformation scenarios, including technical measures. The learning module developed here addresses these competency requirements and uses the example of the ETA factory to show how a competency-oriented learning module for industrial transformation can be structured. It essentially comprises four phases: data collection and CO2 equivalents accounting, cause analysis, development of measures and evaluation of measures.
Industry 4.0 Science | Volume 42 | Edition 2 | Pages 38-47 | DOI 10.30844/I4SE.26.2.38
Digital Competence Lab (DCL) for Speech Therapy

Digital Competence Lab (DCL) for Speech Therapy

Designing a learning platform to advance digital skills
Anika Thurmann ORCID Icon, Antonia Weirich ORCID Icon, Kerstin Bilda, Fiona Dörr ORCID Icon, Lars Tönges ORCID Icon
The digital transformation of healthcare results in lasting changes in speech therapy. Smart technologies and artificial intelligence (AI) are creating new opportunities to ensure therapy quality, address care bottlenecks, and actively involve patients in exercise processes. At the same time, these developments are expanding the role of speech therapists, who increasingly use digital systems as supportive tools in addition to their core therapeutic tasks. Based on a feasibility study of the AI-supported application ISi-Speech-Sprechen in a real-world setting of complex Parkinson's therapy (PKT), this article outlines the key challenges associated with implementing smart technologies.
Industry 4.0 Science | Volume 42 | 2026 | Edition 1 | Pages 110-118 | DOI 10.30844/I4SE.26.1.102
AI Implementation in Industrial Quality Control

AI Implementation in Industrial Quality Control

A design science approach bridging technical and human factors
Erdi Ünal ORCID Icon, Kathrin Nauth ORCID Icon, Pavlos Rath-Manakidis, Jens Pöppelbuß ORCID Icon, Felix Hoenig, Christian Meske ORCID Icon
Artificial intelligence (AI) offers significant potential to enhance industrial quality control, yet successful implementation requires careful consideration of ethical and human factors. This article examines how automated surface inspection systems can be deployed to augment human capabilities while ensuring ethical integration into workflows. Through design science research, twelve stakeholders from six organizations across three continents are interviewed and twelve sociotechnical design requirements are derived. These are organized into pre-implementation and implementation/operation phases, addressing human agency, employee participation, and responsible knowledge management. Key findings include the critical importance of meaningful employee participation during pre-implementation, and maintaining human agency through experiential learning, building on existing expertise. This research contributes to ethical AI workplace implementation by providing guidelines that preserve human ...
Industry 4.0 Science | Volume 42 | 2026 | Edition 1 | Pages 120-127 | DOI 10.30844/I4SE.26.1.112