Original Source Here
In this experiment, we will use the famous Iris dataset; our target is to use Copilot to perform exploratory data analysis and train a k-NN model with purely the suggested code. We will put rules as below:
- Use only the suggested code, fix typos and data-specific issues only.
- Every action should be accompanied by a clear
- Only 3 Top code suggestions will be taken.
As of the time of writing, the Python notebook in VS code is relatively unstable with Copilot, so I will be using Streamlit as my platform. Streamlit provides a Jupyter notebook-like real-time code updates web application that can help us in exploring the data science project. For more information on Streamlit, you can read my article here.
Import the library packages:
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
💡 Data Loading
# load iris.csv into dataframe
💡 EDA with Copilot
# print the dataframe column names and shape
It impressed me that Copilot auto-understands the printing mechanism in Streamlit which uses st.write() instead of print() as streamlit is a relatively new package in python.
Next, I try with:
# create a scatter plot of the petal length and petal width using plotly express
And this is what I get, looks like Copilot is not clever enough to understand the context inside the data frame 😂:
Next, I try with exact naming, and a nice exact graph is obtained:
# create a scatter plot of the petalLengthCm with SepalLengthCm using plotly express
💡Modeling with scikit-learn:
Next for creating a test and train dataset, I write this:
# splitting the data into training and testing sets (80:20)
and these are the suggestion I get back:
Impressive! Copilot even knows which one is my target class and writes the full code for me, what I need to do is just select the suggestion!
The full code suggestion return is as below:
Next, I try my luck with this command:
# check for optimal K value using testing set
And out of my expectation, Copilot can return me this code:
That’s tons of time saved in coding; Copilot even helps you plot a chart in the end. Well, the chart didn’t work out, so I have to modify the code a bit on my end using the list is created. But it still saves me lots of time going to stack overflow checking for codes.
Out of my curiosity, I asked Copilot, “What is the optimal K value?”
The copilot returns me the answer without the need to plot the graph 😲😲
So this inspired my next command, I want:
# create a classifier using the optimal K value
and then, I just press enter and accept the suggested comment and codes to proceed. here is my resulted code:
Note that I only type 1 command, and the rest is suggested by Copilot.
Out of 5 suggested codes, 3 work perfectly and 2 suggestions: metrics.f1_score and metrics.precision_score doesn’t work out.
That’s the end of my simple code testing with Copilot. I had published the suggested in Github, feel free to see it.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot