Modern production environments’ increasing complexity poses significant challenges for companies when planning and controlling their production [1, 2]. In this context, AI can offer novel perspectives, as it can develop optimal strategies from iterative learning processes that adapt to changing production conditions and automate them [3,4].
However, the successful application of AI in production planning and control requires a robust and scalable data infrastructure that must fulfill a wide range of requirements. Ensuring the availability and quality of real-time and historical data, as well as seamless integration into existing systems such as Manufacturing Execution Systems (MES) and Enterprise Resource Planning (ERP) systems, is paramount. In the following sections, these challenges will be examined in detail, drawing on current research.
The article aims to present a strategy for companies to automate production planning and control using AI and to provide a detailed description of the potential of automating production control using intelligent agents.
Production planning and control basics
Production planning and control is a comprehensive term encompassing all strategic and operational measures necessary to plan, control, and monitor manufacturing processes efficiently [5,6]. The overarching objective of production planning and control is to ensure on-time and cost-efficient production with an optimal use of resources (mainly machinery and personnel), thereby achieving high logistical performance at minimal cost [5]. According to the approaches advanced by Nyhuis et al. [5] and Schmidt et al. [6], production planning and control comprises the following elements:
- Planning: defining production programs, calculating material requirements, and allocating capacities.
- Control: coordinating production processes by checking availability, dispatching production orders, forming meaningful sequences in processing, and adapting capacities.
- Controlling: analyzing deviations and initiating corrective measures.
The Hanoverian Supply Chain Model (HaLiMo) is a theoretical framework that organizes the tasks and processes involved in production planning and control according to their chronological and logical sequence (see Fig. 1) [6]. This integrated approach ensures that production processes remain flexible and responsive, a paramount quality in dynamic markets.
![Figure 1: Hanoverian Supply Chain Model (following [6]).](https://industry-science.com/wp-content/uploads/2025/09/Schneider_I4S-25-5_Figure-1-1024x801.jpeg)
Artificial intelligence in the Hanoverian supply chain model
Manufacturing companies are among the primary beneficiaries of digital transformation [7]. The increasing availability of data presents a range of opportunities for enhancing efficiency, improving quality, and reducing production costs [8]. AI applications are being successfully utilized in various different areas to achieve these goals, whilst also being the object of intensive research. AI solutions are making a particularly critical impact on production planning and control (see [10, 4, 9]).
Within the HaLiMo framework, a range of use cases for AI applications are identified, with the potential to contribute to achieving holistic automation of production planning and control through a multi-agent system approach. In principle, each of the eleven primary production planning and control tasks (see Fig. 1) can be automated using an appropriate AI approach. The multi-agent system can then be implemented so that the individual AI solutions used to automate production planning and control are linked (central AutoPPC) according to their information and material flows (see Fig. 1).
The added value for both research and industry lies in the fact that implementing a future-proof data infrastructure provides a holistic basis for intelligent, adaptive, and automated production planning and control, incorporating both real-time and historical data, and can be seamlessly integrated into existing systems. In the extant literature (see [9, 10]), the optimization of individual production planning and control subtasks with the assistance of AI is the current norm. However, there is little holistic linkage like that offered by a framework model such as HaLiMo, which is the key contribution that this article makes.
The subsequent discussion focuses on automating production control tasks using intelligent Reinforcement Learning (RL) agents. To achieve this, the fundamentals of RL are outlined, followed by a demonstration of how companies can develop intelligent agents to automate discrete production control tasks and integrate them comprehensively within their organization.
RL is a subfield of machine learning in which an agent learns to optimize its actions by interacting with an environment. The agent operates in a defined environment characterized by distinct states. By executing actions, the agent changes the current state and receives feedback as a reward. This reward indicates the extent to which the selected action contributes to achieving the long-term goal [11, 9]. The context and schematic structure are illustrated in Figure 2.
![Figure 2: Schematic structure of a reinforcement learning agent (following [11]).](https://industry-science.com/wp-content/uploads/2025/09/Schneider_I4S-25-5_Figure-2-1024x788.jpeg)
The objective is to devise a strategy that maximizes the cumulative reward over numerous interactions. The existing literature thoroughly outlines the fundamental concepts and methodologies of this approach [11]. Preliminary approaches already demonstrate RL’s potential to automate individual production planning and control tasks and to cope with complexity [12].
For instance, it has been demonstrated that RL agents can outperform established industry standards in time management [3].
Data infrastructure requirements
The performance of RL models depends on the quality and variety of the underlying data and information. In production environments, this data can be categorized into three main types:
- Real-time data: Production plants continuously generate data streams reflecting the current state of machines, material flow, and production progress. This is a consequence of integrating sensors, IoT devices, and MES [13].
- Historical data: Extensive archives of past production processes are necessary to train and validate RL models. Historical data primarily includes production logs with information on planned values, processing times, and capacities [11, 14].
- Context and metadata: Beyond numerical data, information regarding production plans, job priorities, resource availability, and other organizational parameters is crucial. This contextual data, often derived from ERP systems, provides the framework for interpreting production data [15].
The performance of an RL agent is closely linked to the quality of the available data. Incomplete or inconsistent data sets can lead to misinterpretation and faulty learning processes. Therefore, quality assurance processes must be implemented to check the decisions of the RL agent for plausibility and completeness [11, 9]. This necessitates the utilization of contemporary streaming technologies and data pipelines capable of processing substantial data volumes expeditiously [16]. The diversity in data formats from disparate sources poses a significant challenge. Data lakes and ETL (Extract, Transform, Load) processes facilitate efficient aggregation and pre-processing [3, 11].
Integration of the RL agent into existing systems
Integrating the RL agent into the existing IT landscape is a prerequisite for realizing synergies between traditional systems and modern algorithms. The primary focus in this regard is on MES and ERP systems. MES’s significant responsibility is the real-time control and monitoring of production processes [17]. The RL agent must be able to access the data provided by MES directly [17, 11].
In addition, the agent must be able to provide feedback to the system through control commands. ERP systems are designed to manage company-wide data that extends beyond the confines of the production process [17]. To ensure optimal support for the RL agent, it is imperative to integrate and contextualize data from this system. This often necessitates the development of specialized interfaces that facilitate bidirectional communication between the systems, fostering seamless integration and effective collaboration [18].
Implementation roadmap—from concept to practice
The successful introduction of an RL agent in practical production planning and control requires a systematic approach that considers both technical and organizational aspects. The following delineates a detailed and practical roadmap that companies can use to guide them from conceptual planning to training, integration, and continuous optimization. The approach is grounded in the widely recognized Cross Industry Standard Process for Data Mining, a standardized procedure for conducting data mining projects [19]. The schematic structure and procedure are illustrated in Figure 3.

Before developing a client-specific RL agent, companies must perform a thorough inventory of their existing IT and production systems. This encompasses documenting prevailing data sources (for example real-time and contextual data) and the interfaces utilized for MES and ERP systems. Furthermore, it is imperative to establish quantifiable key performance indicators (KPIs) that can effectively gauge the efficacy of the RL agent [9].
The subsequent stage in the process is consolidating the requisite data sources. Ensuring the quality and consistency of the data is paramount and should be facilitated through automated testing and data cleansing processes [11, 14]. Integrating the RL agent into production is contingent upon creating a digital twin or simulation model of the production system. This facilitates risk-free testing and training of the agent under simulated production conditions. Concurrently, a simulation platform should be developed that maps various scenarios, including extreme or rare production situations. This establishes a realistic environment where the RL agent can learn iteratively and refine its decision-making strategies [4, 11].
The initial training in the simulation is initiated at this stage. The RL agent is first exposed to historical and simulated real-time data, thereby facilitating the development of fundamental decision strategies. Through systematic evaluations and refinements, the model undergoes continuous enhancement. Following the successful completion of preliminary training sessions, a gradual integration of real-time data is initiated. Transitions to a productive environment are emulated through incremental assessments, and the model is fortified against contingencies. This interoperability enables the agent to implement decisions based on real-time feedback while simultaneously feeding relevant data back to the higher-level system [17].
Before comprehensive implementation, the RL agent should be deployed in a pilot phase in a controlled area. This pilot phase validates the system’s functionality under real-world conditions while concurrently monitoring and assessing security, data protection, and performance aspects in actual operational settings. After integration, continuous monitoring is imperative. This entails observing the performance of the RL agent, its decisions, and the repercussions of these decisions on production processes.
Actual ‘realization’ can be initiated after completing the six steps above. In this context, realization is the practical implementation of the production planning and control tasks, which the RL agent subsequently automates. It is imperative to note that the schematic sequence of the seven steps, as illustrated in Figure 3, should not be interpreted as a one-time process. Instead, it is a process that must be repeatedly executed and adapted to new conditions.
The system should be designed to continuously acquire knowledge from new data and feedback. Feedback loops between production, IT, and the developers of the RL agent enable the execution of adjustments and the continuous optimization of the model. Implementing an RL agent is known to induce modifications in work processes and necessitate acquiring new skills.
Consequently, it is incumbent upon companies to promptly allocate resources toward training and continuing education programs. The objective of such initiatives is to facilitate the acquisition of competencies in utilizing novel technology while concurrently addressing concerns and inhibitions that may arise when adopting these new methodologies. This approach fosters not only acceptance but also the attainment of long-term success.
This roadmap provides companies with a structured approach to successfully transitioning from the conceptual planning stage to the productive use of an RL agent in production planning and control. Companies can minimize risks and achieve sustainable competitive advantages by systematically considering technical, organizational, and security-related aspects.
Challenges and prospects
Implementing a holistic data infrastructure for RL in production planning and control poses numerous challenges, including the fact that relevant data is often stored in isolated systems that do not communicate effectively. To overcome these silos, it is necessary to implement not only technical but also organizational measures to achieve company-wide data integration.
In addition to data integration, the scalability and performance of the RL agent have been known to present challenges. High data volumes and varying loads characterize production environments. Therefore, the data infrastructure must be designed to scale flexibly and handle peak loads. This challenge can be met through distributed database systems and cloud solutions that guarantee high availability and performance.
A further challenge is the need for interdisciplinary collaboration to successfully implement and integrate such projects into a company’s production process. The success of RL projects in production planning and control is contingent upon close collaboration between data scientists, IT specialists, production logisticians, and security officers.
It is only through an interdisciplinary approach that the complex challenges of data integration and security can be overcome. Implementing an RL agent should be regarded as a continuous process rather than a one-time undertaking, as it involves the ongoing allocation of resources. The data infrastructure must be flexible to accommodate future requirements, technological advancements, and evolving production processes. Consequently, regular evaluations and updates are imperative.
To conclude, it can be stated that the implementation of RL in production planning and control necessitates a robust and future-proof data infrastructure. Essential prerequisites include providing high-quality real-time and historical data and seamless integration into existing MES and ERP systems. By leveraging modern technologies such as digital twins and advanced data preprocessing, companies can achieve sustainable enhancement of their production systems’ performance and secure long-term competitive advantages. Interdisciplinary collaboration and continuous data infrastructure development are pivotal to success in an increasingly digitized and networked industrial environment.
HaLiMo is proposed as a good method for achieving this objective in a structured manner, due to its holistic character and its consideration of interactions, information and material flows in production logistics. Visions for the future could involve automating production planning and control in its entirety. This could be realized through the automation and integration of all production planning and control tasks.
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – SFB 1153 – 252662854.
The original German version of this article can be accessed via DOI: 10.30844/I4SD.25.5.86
Bibliography
[1] Mrugalska, B.; Wyrwicka, M.K.: Towards Lean Production in Industry 4.0. Procedia Engineering 182 (2017), pp. 466-473.[2] Mütze, A.; Lucht, T.; Nyhuis, P.: Logistics-Oriented Production Configuration Using the Example of MRO Service Providers. IEEE Access 10 (2022), pp. 20328-20344.
[3] Altenmüller, T.; Stüker, T.; Waschneck, B.; Kuhnle, A.; Lanza, G.: Reinforcement learning for an intelligent and autonomous production control of complex job-shops under time constraints. Production Engineering 14 (2020) 3, pp. 319-328.
[4] Panzer, M.; Bender, B.: Deep reinforcement learning in production systems: a systematic literature review. International Journal of Production Research 60 (2022) 13, pp. 4316-4341.
[5] Nyhuis, P.; Wiendahl, H.-P.: Logistische Kennlinien: Grundlagen, Werkzeuge und Anwendungen, Springer, Berlin, 2012.
[6] Schmidt, M.; Nyhuis, P.: Produktionsplanung und -steuerung im Hannoveraner Lieferkettenmodell: Innerbetrieblicher Abgleich logistischer Zielgrößen. Springer Vieweg, Berlin, 2021, p. 217.
[7] Sui, X.; Jiao, S.; Wang, Y.; Wang, H.: Digital transformation and manufacturing company competitiveness. Finance Research Letters 59 (2024), p. 104683.
[8] Zhang, Q.; Li, S.; Li, Z.; Xing, Y.; Yang, Z.; Dai, Y.: CHARM: A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability. IEEE Transactions on Cloud Computing 3 (2015) 3, pp. 372-386.
[9] Wang, Y.-C.; Usher, J. M.: Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence 18 (2005) 1, pp. 73-82.
[10] Hiller, T.; Demke, T. M.; Nyhuis, P.: Throughput Time Predictions Along the Order Fulfilment Process. IEEE Access 12 (2024), pp. 9705-9718.
[11] Brunton, S. L.; Kutz, J. N.: Data-Driven Science and Engineering. Cambridge University Press, 2019.
[12] Stricker, N.; Kuhnle, A.; Sturm, R.; Friess, S.: Reinforcement learning for adaptive order dispatching in the semiconductor industry. CIRP Annals 67 (2018) 1, pp. 511-514.
[13] Peschke, F.; Eckardt, C.: Flexible Produktion durch Digitalisierung: Entwicklung von Use Cases. Hanser, München, 2019, p. 249.
[14] Kaelbling, L. P.; Littman, M. L.; Moore, A. W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4 (1996), pp. 237-285.
[15] Pistorius, J.: Industrie 4.0 – Schlüsseltechnologien Für Die Produktion: Grundlagen, Potenziale und Anwendungen. Springer, Berlin, 2020, p. 190.
[16] Jordan, M. I.; Mitchell, T. M.: Machine learning: Trends, perspectives, and prospects. Science 349 (2015) 6245, pp. 255-260.
[17] Berić, D.; Stefanović, D.; Lalić, B.; Ćosić, I.: The Implementation of ERP and MES Systems as a Support to Industrial Management Systems. International Journal of Industrial Engineering and Management 9 (2018) 2, pp. 77-86.
[18] Tao, F.; Qi, Q.; Liu, A.; Kusiak, A.: Data-driven smart manufacturing. Journal of Manufacturing Systems 48 (2018), pp. 157-169.
[19] Wirth, R.; Hipp, J.: CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining, 1 (2000), pp. 29-39.
Your downloads
Solutions: Production Control Production Planning
