A Beginner’s Guide to Linear Algebra for Deep Learning


Original Source Here

A Beginner’s Guide to Linear Algebra for Deep Learning

The nuts and bolts of any deep learning algorithm

Photo by Ashkan Forouzani on Unsplash

Everyone is talking about deep learning and the great utilities it has. More and more people are diving into it, making models, training for hours and achieving great accuracy results. However, when it comes to tackling a new problem or applying for a job, people expect you to know the nits and grits of the subject.

In recent surveys, even though the field’s popularity is soaring through the roofs, there are vacancies in the Data Science jobs because people do not have the fundamental knowledge and suitable skill sets.

Linear Algebra, along with statistics and probability, is the pillars of any machine learning or deep learning algorithm. Of course, you do not need to know linear algebra before starting your machine learning journey. However, you cannot develop a deep understanding and application of machine learning without it.

Here is a comprehensive beginner guide on linear algebra to get you started with it.

What is Linear Algebra?

Linear algebra is the branch of mathematics concerning coordinates and interactions with planes in higher dimensions and performs operations on them. Do not worry if this does not make sense to you right now. We will see it very soon.

If you know algebra, it primarily deals with scalars(one-dimensional entities), but linear algebra has higher dimensions (such as vectors or matrices). Therefore, we can say that it is an extended version of algebra.

Different data structures — Image by HADRIENJ

The core data structures behind deep-learning algorithms are scalars, vectors, matrices and tensors. So let’s dive into a detailed explanation of each of them.


A scalar is a number or 0th order tensor.

Examples are — temperature, distance, speed, or mass.

Here all the quantities have a magnitude but no “direction”, other than the fact that it may be positive or negative.

We have dealt with scalars all our life; everyday calculation you do is on scalar numbers. There are multiple scalar types in python, such as int, float, complex, bytes, Unicode.

Let’s have a look at some operations that we can perform on scalars —

a = 2  # Scalar 1
b = 5 # Scalar 2
print(a + b) # 7 # Addition
print(a - b) # -3 # Subtraction
print(a * b) # 10 # Multiplication
print(a / b) # 0.4 # Division


A vector is a list of numbers or 1st order tensor. There are two ways in which you can interpret what this means —

One way to think of the vector is as a point in space. Then the vector coordinates that point and identifies that point in space where each number will represent the vector’s component and dimension.

Vector representation as a list of numbers

Another way to think of the vector is a magnitude and a direction. In this way, a vector is an arrow pointing from the origin to the endpoint given by the list of numbers.

Vector representation as a point

An example of a vector is —

a = [4, 3]

Here, we only have a vector that is two elements long, but it may expand further to any number of dimensions in the application. The dimensionality of a vector is the number of elements in the array.

The length of a vector is referred to as “magnitude”. It is the distance from the origin to the endpoint. We can calculate the magnitude of a vector using the Pythagorean theorem (x² + y² = z²).

The magnitude of the vector a

Vector addition and subtraction

Vectors can be added and subtracted. We can think of it as adding two line segments end-to-end, maintaining distance and direction.

Let’s take an example. Please do not feel overwhelmed by the diagram; I promise it will feel effortless when you read the process.

Vector Addition

a = [4, 3], b = [1, 2]

c = a + b

c = [4, 3] + [1, 2]

c = [4+1, 3+2]

c = [5, 5]

This is how we make vector additions, and similarly, you can do vector subtraction as well.

Vector Multiplication: Dot Products

Now, we will move to the world of vector multiplication, which is the core of various machine learning and deep learning algorithms.

There are two ways of multiplying vectors, called dot product (scalar product) and cross product.

The dot product of two vectors

The dot products generate a scalar value from the product of two vectors.

The cross product of two vectors

The cross product generates a vector from the product of two vectors.

Here we will mainly focus on dot product as that is important to us.

The dot product is calculated by multiplying each element in the vector with its corresponding element in the second vector. For example —

The dot product of two vectors

We multiply (4 * 1) as the first dimension, then move on to the second dimension. If we had more than two dimensions, we would have multiplied it similarly. In the end, we take the sum of all the values. Notice how we got a scalar value at the end.


A matrix is also like a vector, which is a collection of numbers. The difference is that vectors are like list, whereas a matrix is like a table.

An m*n matrix contains m rows and n columns and contains m*n elements.

m*n matrix

At this stage, we are talking about multi-dimensional arrays (2D in the case of a matrix). Therefore, we use the NumPy package to do all our operations on a matrix as it is fast, efficient and comes with a lot of built-in functions.

Matrix Addition

In most cases, matrices or any higher dimensional data structure follow the same rules as vectors.

Even in the matrix, we do element-wise addition to make the resulting matrix as we saw in the vectors. One thing to keep in mind is that the shape of both the matrix should be equal. By shape, I mean the dimensions. So, for example, you cannot add a 2*3 matrix with a 3*3 matrix.

Process of matrix addition

Here is the implementation of matrix addition —

Example of matrix addition

Matrix Multiplication

Matrix multiplication gets a little complicated since multiple elements in the first matrix interact with multiple elements in the second element.

Matrix multiplication is a tedious task to do by hand, and even for computers, it takes a considerable amount of time if the matrix size gets bigger.

Here is a simple example of multiplying a pair of 2*2 matrices—

Process for matrix multiplication

To explain it easily with a visualisation —

Visualisation of matrix multiplication — Image by HADRIENJ

The first row of the first matrix is multiplied (dot product) with the first column of the second matrix. This process continues till we exhaust the whole matrix.

One constraint with matrix multiplication is that the number of columns of the first matrix should match the number of rows of the second matrix. If not, matrix multiplication is not possible.

(m x n) matrix * (n x p) matrix gives a matrix of dimensions — (m x p).

Matrix multiplication

Matrix Transpose

There will be numerous times when you will have to transpose your matrix when making your model by scratch.

Matrix transpose is flipped version of the original matrix. Thus, we can switch the rows with columns, and we will get our transposed matrix.

Transpose of a Matrix — Image by HADRIENJ


Welcome to n-dimensional space. Now, we are moving past 2D surfaces and will be dealing with multiple dimensions. Every data structure we discussed so far can be described as a tensor, for example, a vector is a first-order tensor, and a matrix is a second-order tensor.

3 Dimensional Tensor — Image by Math3Ma

The beauty of linear algebra is that the rules that apply to vectors and matrices are the same with higher dimension tensors.

We can use libraries like TensorFlow, NumPy, or PyTorch to do our operations on tensors.

I hope this article was helpful to you, and good luck in your data science journey.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: