How to use data governance for AI/ML systems

Your organization can use Data Governance for AI/ML to build the foundation for innovative data-driven tools.

Project manager make motivational presentation or electronic development engineer team use digital whiteboard with neural network, AI and machine learning.
Image: Gorodenkoff/Adobe Stock

Data governance ensures that data is available, consistent, usable, reliable, and secure. It’s a concept that organizations struggle with, and the stakes are raised when Big Data and systems like artificial intelligence and machine language enter the picture. Organizations quickly realize that AI/ML systems work differently than traditional fixed record systems.

With AI/ML, the goal is not to return a value or state for a single transaction. Rather, an AI/ML system sifts through petabytes of data looking for answers to a query or an algorithm that might even seem a bit open ended. Data is processed in parallel with data threads being fed simultaneously to the processor. Large amounts of data that are processed simultaneously and asynchronously can be preempted by IT to speed up processing.

SEE: Hiring Kit: Database Engineer (TechRepublic Premium)

This data can come from many different internal and external sources. Each source has its own way of collecting, curating, and storing data, and may or may not conform to your own organization’s governance standards. Then there are the recommendations of the AI ​​itself. Do you trust them? These are just a few of the questions companies and their auditors face as they focus on data governance for AI/ML and look for tools that can help them.

How to use data governance for AI/ML systems

Make sure your data is consistent and accurate

If you are integrating data from internal and external transactional systems, the data must be standardized so that it can be communicated and combined with data from other sources. Application programming interfaces that are pre-built into many systems so that they can exchange data with other systems facilitate this. If no APIs are available, you can use ETL tools, which transfer data from one system into a format that another system can read.

If you add unstructured data, such as photo, video, and sound objects, there are object linking tools that can link and relate these objects to each other. A good example of an object linker is a GIS system, which combines photos, schematics, and other types of data to provide complete geographic context for a particular environment.

Confirm that your data is usable

We often think of usable data as data that users can access, but it’s more than that. If the data you retain has lost its value because it is out of date, it should be deleted. End users and IT need to agree on when the data should be cleansed. This will come in the form of data retention policies.

There are also other times when AI/ML data needs to be purged. This happens when a data model for AI is changed and the data no longer fits the model.

In an AI/ML governance audit, examiners will expect to see written policies and procedures for both types of data purges. They will also verify that your data cleansing practices meet industry standards. There are many data cleaning tools and utilities on the market.

Make sure your data is trustworthy

Circumstances change: an AI/ML system that once worked fairly efficiently can start to lose effectiveness. How do you know this? Regularly checking AI/ML results against past performance and what’s happening in the world around you. If the accuracy of your AI/ML system is getting away from you, you need to fix it.

Amazon’s contracting model is a great example. Amazon’s artificial intelligence system concluded that it was better to hire male candidates because the system analyzed past hiring practices and the majority of hires had been men. What the model failed to adjust for going forward was a larger number of highly qualified female candidates. The AI/ML system had drifted away from the truth and had instead started seeding hiring bias into the system. From a regulatory standpoint, the AI ​​was out of compliance.

SEE: Artificial Intelligence Ethics Policy (TechRepublic Premium)

Amazon eventually phased out the implementation of the system, but companies can avoid these mistakes by regularly monitoring system performance, comparing it to past performance, and comparing it to what’s happening in the outside world. If the AI/ML model is out of sync, it can be adjusted.

There are AI/ML tools that data scientists use to measure model drift, but the most direct way for business professionals to verify drift is to compare AI/ML system performance to historical performance. For example, if you suddenly find that weather forecasts are 30% less accurate, it’s time to check the data and algorithms that your AI/ML system runs.

Leave a Comment