Original Source Here
Nowadays, companies aim to increase their profits by building an artificial intelligence solution using their data. However, since they often do not have perfect data strategies in place, they do not succeed in their mission. The data strategy has four main pillars: Value, Collection, Architecture, and Governance. I described the data strategy at How to Create a Data Strategy for Your Organization. I have seen common mistakes across companies aiming to create data strategies. In this article, I want to describe best practices to build a data strategy and explain lessons that I learned through some real-world scenarios.
Value — Does it really create business value?
A large organization must find a business objective that can be served well by an AI solution. For example, a gigantic grocery chain can analyze data from its customer loyalty program, supply chain, store transactions, and store traffic. The question is which business objective can be dramatically enhanced using an AI solution. It is common for these organizations to start building an AI solution that its cost-benefit analysis does not make sense. A failure may stop that organization to pursue other promising paths. So, it is highly recommended to run a cost-benefit or complexity-expectation analysis before targetting a business objective using artificial intelligence. To read more about complexity-expectation analysis, you can take a look at the article below.
You must run a cost-benefit or complexity-expectation analysis before targetting a businesss objective using AI.
Collection — Is the collected data relevant to the problem aimed to solve?
It is widely known that an AI solution does not work if a vast amount of data or a variety of data types do not exist. However, the relevancy between data and problem is often neglected. Those who are not experts in AI, expect magic from an AI solution. They think if a vast amount of data feeds to an AI solution, it will be enough. Wrong! The domain experts can simply identify the relevancy of data to business objectives. They know the physics behind the problem, and can easily determine the relevant data need to collect. However, their opinions sometimes are undermined. To read more about other challenges that existed in a large-scale data collection, you can read the article below.
AI will disappoint you without diverse, vast, and relevant data. However, The relevancy between data and problem is often neglected.
Architecture —How efficiently datasets are designed to use, scale and maintain?
As the AI projects switch from single-type data to multi-modal data, the data architecture becomes more important. Plus, in recent years, the data sets grow exponentially in size and use. The exponential growth causes scalability issues that must be addressed in the data architecture. Last, but not least, the data architecture significantly affects the performance and pace of development. Therefore, if you do not build a high-performing data architecture, you will encounter technical challenges at large. Once, I was consulting a large chip manufacturer to build an AI solution to detect failure rates of its chips. They gave me a large number of datasets that not only hold identical data but also each of which only contains one or two useful data fields. It is not needed to say that those datasets were updated frequently as well. Their low-quality data architecture made me spend much time creating a dataset that is clean and reliable which could be easily monitored and updated.
You must analyze how the data architecture affects the project otherwise technical challenges will arise soon.
Governance — Is the required data accessible to various teams?
In large organizations, since teams often work in silo many challenges exist to collect and share data across different teams. For example, the database team is in charge of creating infrastructure for data storage and retrieval while the AI team is in charge of analyzing and processing the data. On the other hand, data collection occurs in the field where AI and database teams are not present. Teams do not have that much interaction and they are not aware of challenges existing on each side. For instance, the AI team may ask for important missing data to improve the performance of the AI model but the data team does not collaborate to collect that data. Or, a department may own a valuable dataset while they do not share it with other teams due to technical or political reasons. The problem occurs here, BOOM! A great data strategy must help various teams to have access to the required data, and be able to collect a new set of data when needed.
You must create a plan to make various teams such as data collection, database, and AI work together efficiently.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot