AI offers significant opportunities in industrial quality control but also presents complex challenges along the human-AI interface, demanding a focus on applied ethics and human-centered design principles. In industrial settings, like those with automatic surface inspection systems (ASIS), machine learning is increasingly deployed for error detection tasks, such as steel rolling inspection. While this may promise increased efficiency and consistency, challenges concerning trust, usability, adaptability, and the role of human workers do arise as a result. Standard AI deployment often seems to neglect these dimensions, thus hindering adoption and raising ethical concerns. A human-centered approach to AI (HCAI) is therefore required.
This paper presents findings from a science-practice collaboration with an ASIS supplier. Core obstacles to the effective integration of ASIS into production lines were identified: data inefficiency (especially for rare errors), user distrust due to unclear model behavior and variable performance, limited user AI expertise, and information overload for operators. These practical challenges can be addressed by specific HCAI-aligned technical solutions focused on data quality, model transparency, and user empowerment. Co-developed solutions presented here—anomaly detection, label error detection, and drift monitoring concepts—can, however, enhance worker agency, trust, and sustainable AI adoption in industrial surface inspection.
This paper’s key contributions are: (1) identifying practical ASIS deployment challenges as ethical HCAI deficits, (2) mapping these challenges onto HCAI solutions, thereby (3) demonstrating how AI ethics can provide a framework for solving practical human-AI collaboration challenges in industry, and (4) presenting technical solutions for data quality awareness and reliability management.
Background
Effective and ethical AI integration into workplaces requires an understanding of both AI reliability and human-centered work design principles. Applied AI ethics concerns relate to fairness, accountability, transparency, and impact on work. HCAI provides a framework to address these challenges along three key dimensions [1, 2]:
- technology development is concerned with ensuring consistent and robust performance (trustworthiness/reliability) and making outputs understandable (explainability)
- employee development focuses on empowering rather than replacing people (human agency & augmentation)and ensuring tolerable work conditions (sustainable work design)
- organizational development strives for ethical data governance by maintaining reliable organizational knowledge bases (responsible knowledge management/accountability) and integrating user domain knowledge, thereby enabling non-experts to manage AI quality.
Neglecting these dimensions creates systems that are technically functional but practically unusable or untrustworthy for the end users [3].
On a technical level, AI performance depends on data quality. Common industrial application issues include label errors, outliers, and insufficient data for rare classes. Anomaly detection helps identify novel or rare events, which is critical for robust error detection. Label error detection improves dataset integrity and model evaluation reliability. Domain drift poses a major challenge to model performance, requiring monitoring to detect shifts in data distributions and responding to changing conditions such as lighting or material variations. Model confidence scores can be misleading, especially under drift, necessitating drift-aware trust calibration.
Prior work on this research project employed a collaborative, interdisciplinary approach involving academic researchers (computer science, engineering, social sciences, information systems), an industrial ASIS provider, and ASIS quality engineers to identify challenges to human-centered AI design. That work utilized interdisciplinary workshops with stakeholders (system developers, project managers), end-user feedback, and structured potential and socio-technical workflow analyses [4].
User needs were assessed through semi-structured interviews with quality engineers and line operators, documented via audio-recorded transcriptions. Interview transcripts were analyzed using systematic qualitative methods to derive design principles for ASIS interfaces and characterize user roles through personas [5]. Through these methods, relevant workflows, pain points (e.g., commissioning time pressure), and HCAI gaps were identified.
Method: Translating HCAI challenges into technical solutions
The authors of the present study, who participated in the original workshops and had access to the interview transcripts, collaboratively analyzed the documented pain points through discussions with the ASIS supplier and stakeholders. Through these iterative exchanges, recurring themes emerged. These were mapped onto the HCAI framework dimensions [1, 2], allowing the translation of identified needs into concrete technical requirements.
Technical approaches were selected for simplicity, comprehensibility, implementability, and model-agnostic application potential. Technical prototypes were developed based on proprietary industrial steel band datasets provided by the ASIS supplier. The datasets consisted of two labeled tabular error datasets comprising features derived from error images and steel band surface images for visual detection tasks. The prototypes were validated through internal testing with the ASIS supplier (using validation metrics and user tests) and customer feedback sessions [5, 6].
Case study: AI in Automatic Surface Inspection Systems (ASIS)
ASIS products are deployed in steel rolling to automatically assess steel band quality and flag possible low-quality bands. The system is first installed by a provider. Then, during commissioning, the AI model is adapted. It can also be retrained later for changing conditions or requirements. The typical workflow for the ASIS customer involves on-site image acquisition, manual error classification for training, real-time computer vision-based error detection, and AI classification [7]. There are three key users: line operators, who run production and require immediate feedback about current process quality; quality engineers, who monitor whether product quality meets requirements; and ASIS administrators, who manage and update data, maintain system results, and optimize performance when necessary.
Applying this methodology to AI in ASIS revealed specific HCAI challenges, presented in Figure 1.

To respond to these challenges, three technical solutions are proposed:
Solution 1: Anomaly and novel class detection on tabular data
Unsupervised anomaly detection methods were explored to address data scarcity for rare or novel errors using tabular data [8, 9]. Various algorithms were evaluated on the two industrial tabular datasets, including proximity-based, statistical, linear, and ensemble methods. The most promising approach identified was a combination of a proximity-based algorithm (Connectivity-based Outlier Factor (COF)) [10] and the entropy of predicted class probabilities. This method ranks the “unusualness” of detections relative to the model’s training data based on both sample feature space isolation and classifier confidence.
Anomalies were simulated by omitting samples of rare classes from the dataset (appr. 3% of the data). The mean area under the receiver-operating characteristic (AUROC) was chosen as a metric for investigating the ranking quality and the system’s ability to discriminate between normal and anomalous samples. The combined approach using COF and entropy-based scores yielded the best overall performance across the two tested datasets with AUROCs of 80.36% and 83.14%.
Given the imbalanced nature of error datasets and false positive rates of 37.58% and 35.77% at an 85% true positive rate, the proposed method still suffers from false alarms, but it identifies anomalies and novelties fairly robustly. By identifying anomalies, the system supports data quality awareness and can guide active learning workflows, prompting users to check whether unusual samples are mislabeled or belong to novel classes. This not only enhances user agency but also contributes to responsible knowledge management by reducing the likelihood that flawed or incomplete data becomes embedded in organizational knowledge bases.
Solution 2: Visual anomaly detection on image data
Complementary to the tabular approach, visual anomaly detection was implemented directly on steel band surface images to improve error localization using zero-shot and semi-supervised few-shot learning, eliminating the need for extensive error examples. AnomalyDINO [11], a patch-based deep nearest-neighbor approach using features from the pre-trained DINOv2 vision transformer, was adapted for the purpose. This method compares visual patches from test images against a memory bank of ‘good’ surface patches to identify deviations. For ASIS, reference patches contain only error-free steel band images.
Internal evaluation on two steel band datasets demonstrated strong performance, achieving AUROC scores of 98.45% and 97.0-98.52%, with performance improving from 16 to 100 reference shots. Zero-shot analysis reached 99.13% AUROC through unsupervised anomaly detection and quantile optimization on a different proprietary dataset. However, the approach has difficulty detecting very small anomalies like oil stains or scratches, which may not sufficiently elevate anomaly scores, and can profit from further hyperparameter search.
Solution 3: Label error detection
An algorithm was developed to improve dataset integrity and trust in model evaluations by identifying samples in the training and validation data that are likely to be labelled incorrectly. The Area Under the Margin (AUM) [12] ranking method for gradient-boosted decision tree models was adapted for the purpose [6]. The method calculates a score for each sample by averaging the margin between the predicted probability of its assigned class and the next most likely class. This takes place across all boosting steps during a single training run.
AUM proved computationally efficient, requiring only one training run. On the two industrial tabular datasets, AUM achieved AUROC values of 97.1% and 96.2% for detecting 5% injected synthetic asymmetric label noise [13], comparable to more computationally expensive out-of-sample prediction methods [6]. In two real-world validation sessions with industry practitioners, 42% of the samples with the lowest AUM scores were confirmed as label errors, while others pointed to further data quality issues, validating the method’s practical utility.
Presenting low-AUM samples to quality engineers thus enables targeted data label review and correction. This improves user agency and augmentation by allowing engineers to curate data more efficiently and with reduced manual burden, which in turn supports responsible knowledge management by reducing the risk that organizational knowledge bases are corrupted by mislabeled data. It also helps foster consensus on error classification criteria.

Integration outlook
These solutions were integrated into existing ASIS software and workflows (Fig. 2). Anomaly and label error scores are displayed alongside error detections in tabular views used by quality engineers, allowing sorting and filtering to prioritize review tasks. Anomaly localization can be overlaid on steel band images or used for error localization. To make model reliability information accessible and minimize operator information overload, a simple “traffic light” indicator could be used, which would synthesize information from multiple sources (e.g. anomaly score distributions and model confidence levels) into a glanceable AI reliability assessment for real-time monitoring interfaces. This could be displayed in the operator’s real-time monitoring interface.
Combining detailed scores for expert review with high-level indicators for operators supports informed decision-making across roles: the traffic light visualization reduces cognitive burden by lowering the volume and complexity of information that operators must process, which contributes to sustainable work design and shifts their focus to important events (augmentation). Meanwhile, detailed anomaly and label error scores enhance the trust of engineers and equip them for responsible knowledge management in data-driven decisions.
Discussion and conclusion
The co-developed prototypes, validated through lab simulations and real-world user feedback sessions, demonstrate that addressing data and model quality via an HCAI lens supports user agency and improves system resilience in industrial ASIS. User feedback from validation sessions confirmed the value of the label error detection solution. Ranking potential label errors clearly reduced the manual effort required for data quality assurance compared to non-targeted manual checks. This is demonstrated by the 42% identification rate based on label error ranking compared to regular label error prevalence rates (e.g., less than 5%) [6]. Data-efficient image-level anomaly detection proved effective at detecting surface defects on customer data.
Overall, technical solutions focused on data quality transparency are crucial for building trustworthy, resilient industrial AI. They bridge algorithmic capabilities and practical usability by empowering domain experts to understand, validate, and improve AI performance. In doing so, they directly enhance agency and augmentation (through more efficient workflows), trustworthiness (via transparent reliability monitoring), and responsible knowledge management (by improving data integrity). Strategies for ethical and effective HCAI are thereby shown to transfer beyond industrial ASIS, as outlined in Figure 3.

These strategies are relevant to domains requiring dependable AI with robust human oversight, such as safety-critical applications where data scarcity and reliability are an issue. By focusing on human-centered principles and co-developing technical solutions for data quality and reliability monitoring, this science-practice collaboration demonstrates how AI systems can become transparent, trustworthy collaborators that augment human capabilities.
This research and development project is funded by the German Federal Ministry of Research, Technology and Space (BMFTR) within the “The Future of Value Creation – Research on Production, Services and Work” program (02L19C200) and managed by the Project Management Agency Karlsruhe (PTKA). The authors are responsible for the content of this publication.
Your downloads
Potentials: Dynamics
