Will Generative AI Replace the Need for Data Analysts?

Original Source Here

Artwork Created by Author Using Midjourney

Will Generative AI Replace the Need for Data Analysts?

No. But it will redefine the data analyst role.

Since ChatGPT’s release in November of 2022, speculation has grown over whether or not the role of a data analyst could eventually be replaced by generative AI (ChatGPT, Bard, and Bing Chat are among the large language models included in this classification). Much of this speculation is fueled by the ability of these large language models(LLMs) to write code.

As someone who has been in the data analysis field for the majority of my professional career, understanding the impact of generative AI in our field is something that has definitely piqued my interest. Giving in to curiosity, I have since spent a fair amount of time assessing the current capabilities of generative AI within the context of data analysis.

In this article, I summarize and share my findings with you as I believe generative AI will have a significant role in data analysis work going forward. Furthermore, I believe that it is imperative for the data analyst community to understand the profound impact it will have on not only their field but the business landscape as a whole.

Where We Stand Today

At this point, we know that generative AI can write SQL, Python, and R code. We can also assume the efficiency of the code they produce will only get better over time with continuous fine-tuning. But that’s just the start.

At the end of March (2023), OpenAI’s ChatGPT released a plugin called Code Interpreter. If you are one of the few who currently have access to the Alpha version, you can upload data files into it and invoke Python to perform regression analysis and descriptive analysis, look for patterns in your data and even create visualizations. All without having to write or even know a line of Python code! Esteemed Wharton School of Business professor Ethan Mollick has a nice write-up on this.

So there you have it. The ability to load, analyze, and present data without writing a stitch of code. Game over yes? Not so fast.

As incredibly impressive as these capabilities are, there are some significant limitations to Code Intrepretor, that are indicative of some of the challenges that generative AI would have in taking over the data analysis industry.

First, it requires the upload of ONE table. One two-dimensional CSV file (currently limited to 100 MB). The size limitation aside, imagine being tasked with building one table with all of your company’s data…

I could probably stop there, but let’s go on.

With your one table in hand, you now have to get approval to get your one table with ALL of your company’s data pushed outside of your company’s firewall into an LLM that they have no control over…

We can probably stop there.

The current alternative(more on this later) to the above would be that your company builds its own LLM. While theoretically possible, the complexities of training and fine-tuning the model, the expertise required and the enormous costs of doing so would only make that cost-effective for an extremely short list of companies.

But for the sake of understanding, let’s take a step back and imagine your company is on that list.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: