Original Source Here
A beginner’s approach to DL
Basics of DL and a short code
I’m a beginner in the field of DL and as the saying goes that before getting knee-deep in any field, study the basics first. So I started learning DL by taking up one of the most popular courses from Coursera, by DeepLeaning.AI.
This is my week 2 since I’ve started learning and by now, I have quite a theoretical idea about logistic regression and the functions that are used to minimize the cost functions. I’m still new to the python programming part. I’m trying to devote at least half an hour every day to this course (which sounds too little, but the interest keeps me going!)
First of all, I tried to understand the most basic unit of ML or DL, what exactly is a Neural Network? How can this be used to do stuff as basic as detecting an object in an image, to doing amazing things with artificial intelligence?
When simply put, traditional programming is like- we put in the algorithm and inputs in a black box, and what comes out is the answer. But these coding algorithms can’t always be used to solve problems (for example, when we detect whether a human is running or playing on the basis of their heart rate) and then, Machine Learning comes in to save the day. With tools in ML, we can put the input and answer inside the black box, and we get the rules as the output. And these rules can be used to predict answers for new inputs.
The black box which helps us predict the rules or the “relationship” between the inputs and the desired answers, is called a neural network.
Neural networks are basically connected layers of “neurons” (we try to teach a machine to replicate our brain), which predict relations, by methods like logistic regression, in which the neurons take a guess at what the relation might be, and then check how wrong their guess was, by calculating a loss function. And accordingly, they make sure to make a better guess next time, and this process goes on.
A representation of a neural network looks like this:
This is one of the mechanisms which is used for binary classification.
Let us take an example where we have the inputs as ‘x’ and outputs as ‘y’, where is binary (either 0 or 1). For example, if x represents a cat, then y tells whether a cat is present(y=1) or absent(y=0) in the picture. We have to use logistic regression and make our model learn the relation between them.
In an image, each pixel has an RGB value. If the image has dimensions axb, then all the RGB values are stacked into one column, which will be a column matrix of size n=a*b. Here, the column matrix is denoted by ‘x’. And ‘y’ denotes the image’s label.
Let y’ be the probability that y=1 given x, which would be written as y’=P(y=1|x). The parameters used in logistic regression are W(denotes weights) and B (denotes biases).
First of all, we try to make our model recognize what can the ‘y’ corresponding to an ‘x’ be. For this, we follow Forward Propagation, in which we try to minimize the loss function, which computes the loss between y’ and y.
The expression for y’ is Wᵗx+B, where x∈ℝⁿ, W∈ℝⁿ (nx1 matrices) and Wᵗ is the transpose of matrix W. But this value doesn’t lie in [0,1], as we expect y’ to be. Thus we use the sigmoid function (denoted by σ) to bring its value within the range.
So finally, y’=σ(Wᵗx+B)=σ(z), where σ=eᶻ/(1+eᶻ), such that y’∈[0,1]. We sometimes write y’ as ‘a’ so that it looks nice and less terrifying.
Now, let’s assume we have ‘m’ samples, over which we have to train our model, and we have to aim for a≈y for each sample. To achieve this, we have to minimize the loss function, which is given by:
During the entire epoch, that is, one cycle of analyzing all the images, we try to minimize the cost function, which is the average of all the loss functions, given as:
All that I’ve written above, was just for one epoch’s forward propagation. In forward propagation, we tend to minimize the losses. Next, we have to perform backpropagation, in which we calculate the derivatives of loss wrt the other parameters, and updates the values of weights W and bias B.
We start off by calculating the derivatives. We calculate ‘da’ as the partial derivative of L(a,y) wrt a, ie, da=∂L/∂a, and similarly we have dz=∂L/∂z. The equations we obtain for required derivatives for the jᵗʰ sample are:
where ‘a’ relates to the aᵗʰ feature of the sample.
We looked at derivatives required for the jᵗʰ sample. Now, for one epoch, we need to sum them up and find their average. Then, we use it to update the values of our weights, using the expression:
where wₐ is the weight corresponding to the aᵗʰ feature and α is the learning rate of our model.
Now, gradient descent is a process in which our model updates the parameters such that we obtain the local minima of our cost function.
All of the above explanations for forward and backprop result in just one step of gradient descent. Let’s see a small piece of an algorithm (more like a vague code in python) that will implement all of the above, for 1 epoch.
import numpy as np
Many such steps of gradient descent will help us get to the minimum value of the cost function, and when we achieve that, our model is ready to predict a nearly expected result!
Example of a Neural Network Model
To show an application of logistic regression, I’ll create and train a neural network using TensorFlow, that will import images and their labels from fashion_mnist, (which is a huge dataset of images of shoes, handbags, and other fashion stuff), normalize them, and train the model accordingly.
P.S.: You’ll see some noob python code below!
Firstly, we import TensorFlow and MNIST data, which is available in the Keras dataset.
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
Then we load the images and their labels:
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
We get all the RGB values of the images between 0 to 255. And thus to normalize this and bring the range down to 0 to 1, we divide the RGB value of all images by 255.
training_images = training_images / 255.0
test_images = test_images / 255.0
Now, it’s time to design our model. The first layer of our model will flatten the images, i.e., will convert them into ‘nx1’ matrix. Then, the second layer, containing 128 neurons, will use ReLU as an activation function (which outputs zero when the number is negative, x otherwise). And finally, the third layer will have 10 neurons (which is the number of labels we get from the MNIST dataset).
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
Now, we write code that will do the logistic regression for us and then fit the model so that it predicts the nearly expected label for a corresponding training image. And we can train it for say, 5 epochs.
model.compile(optimizer = tf.optimizers.Adam(),
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])model.fit(training_images, training_labels, epochs=5)
Now that we’ve trained our model, we have to test it. The following code does the work:
And having executed this, we’ve successfully learned how to train and test a basic neural network!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot