In industrial settings, most of the data is initially unstructured. Converting this data into structured format and storing it in data lakes is crucial for several reasons. Firstly, structured data is easier to analyze and is compatible with various data analysis tools, enabling applications such as process optimization, quality improvement, and predictive maintenance. Secondly, storing data in data lakes centralizes data management, enhancing efficiency and reducing costs associated with complex data pipeline constructions.
Data Generation
Various equipment and systems in industrial environments generate a massive amount of data. This data is typically generated in the following ways
- Control PCs: Responsible for process control and monitoring.
- PLC(Programmable Logic Controller): Communicates with field devices like sensors and actuators to collect data and control processes.
Data Collection
The data generated is centralized for efficient management
- CIM(Computer Integrated Manufacturing) PC: Collects and integrates data from Control PCs and PLCs to enhance process efficiency.
Control Systems, Data Collection, FDC System
Data passes through several stages of systems for analysis and processing
- Control Systems: Monitors process states and alarms, manages work instructions and scheduling.
- Data Collection System: Gathers data from various sources.
- FDC(Fault Detection and Classification) System: Analyzes data to detect and classify anomalies.
ETL Process
Data is finally stored in the data lake through the ETL process
- Extraction: Data is extracted from the FDC system.
- Transformation: Data is cleaned, standardized, and transformed if necessary.
- Loading: The transformed data is stored in the data lake.
Data Lake
A data lake has the following characteristics
- Stores and manages large volumes of structured and unstructured data.
- Can store raw data directly.
- Data stored in the data lake can be transferred to data warehouses or other analysis tools for further processing and analysis.
Conclusion
Systematically collecting and managing the vast amount of data generated in industrial environments is crucial for maximizing manufacturing efficiency and enhancing competitiveness. Proper data storage and management in data lakes enable real-time analysis and decision support, contributing to overall operational efficiency.