Original Source Here
What is MNIST dataset ?
MNIST is a dataset of 70,000 images of digit handwritten by high school students and employees of the US Census Bureau. All images are labelled with the respective digit they represent. MNIST is the hello world of machine learning.
There are 70,000 images and each image has 784 (8*8) features. Each image is 8*8 pixels, and each feature simply represents one pixel’s intensity from 0 (white) to 255 (black).
Importing libraries and dataset needed for this classfication
We have imported various algorithm for classfication such as Logistic , Decision , Support vector machine , Gradient Boosting , AdaBoosting and last but not least Random Forest classifier. we’ll experimenting on all this models or algos and pick the best out of them to get better accuracy
Now as data has imported and by inspecting that data you’ll get to know that they are 1 dimensional array of 64 size and each value in that array represents the pixel values for the images and they’ll be ranging from 0 to 255.
Let’s inspecting the data
first we need to convert them into 8*8 2 Dim array from 1Dim array of 64 size. and because it is gray scale there’s no need to add additional channels.
here’s i have plotted 12 of them using matplotlib for visualization purpose
here’s we can see how it is look we got here image of [0,1,2,3] with corresponding target values for it and it is printed as title of each image.
and target values or labels ranges from 0 to 9 and total 10 targets
Once a wise man said.. before feeding raw data into any classifier , it is always best to do some normalization or preprocessing. because distribution of each sample varies. And Sklearn library made it easy for us to do that , just one or two lines of code to do that.
here we have converted our data into 0 to 1 range from 0 to 255 and after that we try to visualize the data if we are messed or not and seems nothing’s wrong so we are good to go
Before we define our model , one thing to make sure is that we’ve splitted our data into two parts. 1) train set and 2) test set for to cross validation , and again sklearn to rescue
data is splitted into two parts train set and test set , we’ll be training our model on train set and accuracy will be measured on test set.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot