The importance of data integrity in the lifecycle of a construction project:

Data integrity is the cornerstone of any successful project, particularly in complex, multi-stakeholder environments like the construction of semiconductor fabrication facilities. The failure of the NHS’s test-and-trace system back in October 2020 serves as a cautionary tale on the pitfalls of inadequate data management. We can all agree Excel has its strengths but relying solely on it for large-scale projects can lead to significant risks. In this article we explore why maintaining data integrity throughout the project lifecycle is crucial.

Background: The UK’s NHS Test-and-Trace Excel Failure

Anyone remember that thing called Covid –19? You know... the masks, the obsession with baking bread, zoom parties on a Friday night....  

The excel nerds among you may recall the “major technical error” in the UK’s test-and-trace system that led to over 15,000 positive COVID-19 cases being excluded and thousands of infected individuals not being alerted.  

The problem boiled down to multiple organizations using different versions of excel to manage the large quantity of the data being collected, specifically the outdated XLS format. The Public Health England (PHE) system used this old system to manage CSV data from commercial labs, leading to data truncation and loss.

The Role of Excel in Data Management

Love it or hate it, Excel is widely used. It's great for accessibility and flexibility, allowing users to perform various data analysis tasks without extensive training. It offers features like data visualization, pivot tables, and complex formulas that make it a valuable tool for many business operations. However, Excel has limitations that can become critical issues in large-scale, data-intensive projects like the Test-and-Trace programme

  • Row and Column Limits: Older versions of Excel (pre-2007) have a row limit of 65,536 and a column limit of 256. The newer XLSx format extends these limits to 1,048,576 rows and 16,384 columns, but even that may not be enough in the data rich world we live in.
  • Manual Data Handling: Excel often requires manual data entry and manipulation, which increases the risk of human error. But you knew that already!
  • Scalability Issues: Excel is not designed for handling extremely large datasets efficiently. As the dataset grows, performance issues can arise, leading to slow processing and potential crashes.
  • Data Integrity Risks: Ensuring data accuracy and consistency is challenging when relying on manual processes and error-prone software.

The Importance of Data Integrity in Semiconductor Fab Facility Projects

The construction of a semiconductor fab facility involves hundreds of companies and thousands of contractors including engineers, suppliers, and project managers. Each of these parties generates and relies on vast amounts of data, making data integrity essential for the project's success. Here are key reasons why maintaining data integrity is crucial throughout the project lifecycle:

  • Accurate Decision-Making: Reliable data ensures that decisions are based on accurate and up-to-date information. Inaccurate data can lead to costly mistakes and project delays.
  • Project Coordination: Effective collaboration among stakeholders requires consistent and reliable data. Miscommunication or data discrepancies can disrupt project timelines and lead to conflicts.
  • Resource Management: Accurate data helps in the efficient allocation of resources, ensuring that materials, labor, and equipment are available when needed.
  • Optimization and Efficiency – Its crucial to take your unstructured data sets from multiple sources and then standardize it so that you can optimize processes using machine learning algorithms.

Looking at one small use case and the assignment of points of connection (POCs) for tooling in the clean room. Connection Tools to facility services requires the tracking of hundreds of thousands of connections and a lot of the time this is done via spreadsheets. When constructing in the field, it’s common to find the originally designed POC isn’t right, and a new one needs to be found.    

In teams that do collaborate well together (not always the case!) the field team would request a new POC from a central team who would then update the source of truth (excel) with the new connection. With multiple teams all accessing the one sheet at different times, we have seen first hand where the Excel gets saved over and streams of data of design changes have been lost forever.

Lessons Learned and Recommendations

The NHS test-and-trace failure highlights the dangers of relying on outdated and manual data management processes. For large projects like semiconductor fab facility construction, it is crucial to adopt more automated and robust data management solutions. Here are some recommendations:

  • Adopt a Centralized Database: Use a robust database system that can handle large volumes of data and provides real-time access to all stakeholders. Systems like SQL, Oracle, or cloud-based databases offer superior data integrity and scalability.
  • Implement Automated Data Integration: Automate the data integration process to reduce the risk of human error. Tools like ETL (Extract, Transform, Load) can automate the extraction of data from various sources, transforming it into the required format and loading it into a centralized database.
  • Utilize Advanced Data Analytics Tools: Leverage advanced data analytics and business intelligence tools that can handle large datasets and provide meaningful insights. Tools like Tableau, Power BI, and SAS offer powerful data visualization and analysis capabilities.
  • Regular Data Audits and Validation: Conduct regular data audits and validation checks to ensure data accuracy and consistency. Implement automated validation rules to detect and correct errors early.
  • Train Staff on Data Management Best Practices: Ensure that all project stakeholders are trained in data management best practices and understand the importance of data integrity. Regular training and updates can help mitigate human errors.

Conclusion

Excel has its place in data management due to its flexibility and ease of use. However, for large, complex projects involving multiple stakeholders, such as the construction of semiconductor fab facilities, relying solely on Excel can be a recipe for disaster. And a lot of companies do!

The UK’s test-and-trace system failure serves as a stark reminder of the importance of maintaining data integrity through robust, automated processes. By adopting advanced data management solutions and ensuring all stakeholders are aligned on best practices, your project has a much better chance of being delivered on time and within budget.