Can Github Copilot Make You a Better Data Scientist?

https://miro.medium.com/max/1124/0*EhR4EC3Tsf3Aa2HJ

Original Source Here

Ready to Fly with your copilot? Photo by Rafael Cosquiere from Pexels

The Experiment:

In this experiment, we will use the famous Iris dataset; our target is to use Copilot to perform exploratory data analysis and train a k-NN model with purely the suggested code. We will put rules as below:

  1. Use only the suggested code, fix typos and data-specific issues only.
  2. Every action should be accompanied by a clear comment/command to Copilot.
  3. Only 3 Top code suggestions will be taken.

As of the time of writing, the Python notebook in VS code is relatively unstable with Copilot, so I will be using Streamlit as my platform. Streamlit provides a Jupyter notebook-like real-time code updates web application that can help us in exploring the data science project. For more information on Streamlit, you can read my article here.

Import the library packages:

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px

💡 Data Loading

# load iris.csv into dataframe
Code Suggestion by Copilot. Image by Author

💡 EDA with Copilot

# print the dataframe column names and shape

It impressed me that Copilot auto-understands the printing mechanism in Streamlit which uses st.write() instead of print() as streamlit is a relatively new package in python.

Next, I try with:

# create a scatter plot of the petal length and petal width using plotly express

And this is what I get, looks like Copilot is not clever enough to understand the context inside the data frame 😂:

Next, I try with exact naming, and a nice exact graph is obtained:

# create a scatter plot of the petalLengthCm with SepalLengthCm using plotly express
Plotly graph created by Copilot. Image by Author

💡Modeling with scikit-learn:

Next for creating a test and train dataset, I write this:

# splitting the data into training and testing sets (80:20)

and these are the suggestion I get back:

Impressive! Copilot even knows which one is my target class and writes the full code for me, what I need to do is just select the suggestion!

The full code suggestion return is as below:

Next, I try my luck with this command:

# check for optimal K value using testing set

And out of my expectation, Copilot can return me this code:

That’s tons of time saved in coding; Copilot even helps you plot a chart in the end. Well, the chart didn’t work out, so I have to modify the code a bit on my end using the list is created. But it still saves me lots of time going to stack overflow checking for codes.

Out of my curiosity, I asked Copilot, “What is the optimal K value?”

The copilot returns me the answer without the need to plot the graph 😲😲

So this inspired my next command, I want:

# create a classifier using the optimal K value

and then, I just press enter and accept the suggested comment and codes to proceed. here is my resulted code:

Note that I only type 1 command, and the rest is suggested by Copilot.

Result of suggested code

Out of 5 suggested codes, 3 work perfectly and 2 suggestions: metrics.f1_score and metrics.precision_score doesn’t work out.

That’s the end of my simple code testing with Copilot. I had published the suggested in Github, feel free to see it.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: