Linear algebra essentials for data scientist along with implementation

https://miro.medium.com/max/1200/0*plFJbdGYKGI2CJi1

Original Source Here

Suffice to say that when it comes to any technical field, we can take Albert Einstein’s advice when he said that

If you can’t explain it simply, you don’t understand it well enough.

I think to reach that level as data scientists, we need a decent understanding of the mathematics that leads to all the models and methods we use to make sense of data. While we already talked about the concepts of probability involved in data science in this blog, I would like to focus on the second pillar, namely Linear Algebra.

As you know, whether you use pandas, NumPy, or something else, most of our data is in the form of a database, and the statistical inferences drawn from this data are stored in the form of matrix or arrays(vectors). As a common practice, we may calculate many parameters from the data, but it is important to understand the mathematics behind it to understand its significance better. So let’s start with the basics of linear algebra and its implementation in NumPy.

Vector

In programming terms, vectors are simply one-dimensional arrays. So a vector of size n is a list of n components of the same time. Since usually, we store numbers in them, we can define them as a 1-D array for ease of calculations, a list can also be used.

Note: For all further code snippets, we assume that NumPy has been imported as np (command: import numpy as np)

When we want to convert a defined list to an array
Syntax :
np.array(list)
Argument :
list : It takes 1-D list
When we want to define an array with all elements as zero
Syntax : numpy.zeros(shape, dtype=float, order='C')
Arguments :
shape : int or tuple of ints
dtype : data-type, optional
order : {‘C’, ‘F’}, optional, default: ‘C’(row-major)
When we want to define an array with all elements as fill_value
Syntax : numpy.full(shape, fill_value, dtype=None, order='C')
Arguments :
shape : int or tuple of ints
fill_value : scalar or array_like
dtype : data-type, optional
order : {‘C’, ‘F’}, optional, default: ‘C’(row-major)
When we want to define an array with all elements as one
Syntax : numpy.ones(shape, dtype=None, order='C')
Arguments :
shape : int or tuple of ints
dtype : data-type, optional
order : {‘C’, ‘F’}, optional, default: ‘C’(row-major)

These are the basics ways of defining a vector in NumPy, there are many more, but these are the most frequently used.

As a convention, we represent vector as a column matrix, and so it will be of the dimension 1 x n. (We’ll see this in detail in the next section)

Linear dependence of Vectors

This concept will come in handy at many places so it’s good for us to learn. Let us take n number of vectors, a₁, a₂, …, aₙ. We can call these n vectors linearly dependent if there are n scalars b₁, b₂, …, bₙ such that one of these scalars is not equal to zero but b₁ a₁+ b₂ a₂+…+ bₙ aₙ = 0. If this equation is satisfied only when all bᵢ are zero then these n vectors are linearly independent.

Matrix

What if I said that the board of snakes and ladders is actually just a fun matrix, so if a child can play with those, let’s not be scared of them! Matrix refers to an ordered rectangular arrangement of numbers, in simpler words, it is a grid full of numbers or similar entities present in each cell of the grid. So we can say that vectors are just matrices with dimensions 1 x n or n x 1. In general, an n x m matrix is a two-dimensional array with n rows and m columns. So we can define it in the same way as we did for vectors, but let’s see the difference between the two using an example.

#vector of size 5, filled with zeros
>>> np.zeros(5)
array([ 0., 0., 0., 0., 0.])
#matrix with dimension (2,3) filled with zeros
>>>np.zeros((2, 3))
array([[ 0., 0., 0.],
[ 0., 0., 0.]])

For more understanding of NumPy, check here.

To access any element of the matrix, we represent them using unique indices. An element at the intersection of nᵗʰ row and mᵗʰ column will have an index (n-1, m-1). In some literature, the index could also be (n,m), but we will use the previous one as that is how Numpy accesses the array.

Indexes for 3×3 matrix(Image by author)

Different types of matrices:

  • Square Matrix: It has the same number of rows and columns, any matrix with dimension n x n
  • Diagonal Matrix: It has all non-diagonal elements equal to 0, i.e., all elements with index (i,j) such that i is not equal to j, have value zero.

Note: All elements of a matrix with the index of the form (i, i) are diagonal elements of that matrix

  • Upper triangular matrix: Square matrix with all the elements below diagonal equal to 0.
  • Lower triangular matrix: Square matrix with all the elements above the diagonal equal to 0.
  • Scalar matrix: Square matrix with all the diagonal elements equal to a scalar k.
  • Identity matrix: Square matrix with all the diagonal elements equal to 1 and all the non-diagonal elements equal to 0. This matrix is commonly represented as I.
  • Column matrix: A matrix with only one column.
  • Row matrix — A matrix with only one row.

Let’s dive into some basic terminology regarding matrices, and I’ll follow these with their code snippets wherever necessary.

  • Order of a matrix = rows*columns of that matrix. Matrix of dimension n x m has order mn.
  • Trace of a matrix = Sum of all the diagonal elements of a square matrix.
>>>np.trace(np.ones(3,3))
3.0
  • Transpose of a matrix: A matrix formed by turning all the rows of a given matrix into columns and vice-versa. In simpler words, it is the flipped version of a matrix.
Image by Author
>>> x = np.arange(4).reshape((2,2))
>>> x
array([[0, 1],
[2, 3]])
>>> np.transpose(x)
array([[0, 2],
[1, 3]])
  • Determinant of a square matrix: It is a scalar value that depends on the entries of that square matrix. It is calculated using different formulas based on the size of that square matrix. It is represented as |A|.
Determinant of 2×2 matrix. Image by author
Determinant of 3×3 matrix. Image by author
>>> a = np.array([[1, 2], [3, 4]])
>>> np.linalg.det(a)
-2.0
Explanation: det = (4x1) - (3x2)
  • Scalar multiplication of matrix: When we multiply a scalar to a matrix, each matrix element gets multiplied by that scalar.
>>> np.array([1, 2, 3]) * 2
array([2, 4, 6])
  • Addition of Matrix: When we add two matrices, A and B, the element on an index (i, j) for the resultant matrix is equal in value to the sum of elements at (i, j) in A and B, Cᵢⱼ = Aᵢⱼ + Bᵢⱼ. Two arrays can be added using the ‘+’ operator, given their dimensions are equal.
  • Matrix Multiplication: Matrix multiplication is an operation that takes two matrices as input and produces a single matrix by multiplying rows of the first matrix to the column of the second matrix. A necessary condition for matrix multiplication between two matrices is that the number of columns of the first matrix should be equal to the number of rows of the second matrix. A couple of things to keep in mind for matrices A, B, and C:
  1. Here if A has dimension (n,m), then B has dimension (m,p)
  2. If D = A x B, then the dimension of D is (n,p). The formula for each element of D is dᵢⱼ=aᵢ₁ b₁ⱼ+aᵢ₂ b₂ⱼ+…+aᵢₘ bₘⱼ
  3. A x B ≠B x A (Matrix multiplication is not comutative in nature)
  4. If A x B is possible, it does not mean that B x A is possible.
  5. A x (B x C) = (A x B) x C (Matrix multiplication is associative in nature)
  6. A x (B +C) = (A x B) + (A x C) (Matrix multiplication is distributive in nature)
  7. A x I = A and I x A = A (Where I is the dimension appropriate identity matrix)
>>> a = np.array([[1, 0],
... [0, 1]])
>>> b = np.array([[4, 1],
... [2, 2]])
>>> np.matmul(a, b)
array([[4, 1],
[2, 2]])
  • Minor: A minor is the determinant of the square matrix formed by deleting one row and one column from some larger square matrix. Minor corresponding to the index (i,j) will be the determinant of the matrix resulting after we delete the iₜₕ row and jₜₕ column.
  • Cofactor: Cofactor can be derived from the minor by multiplying the minor with the sign of the position. So for index (i,j), Cᵢⱼ = (−1)ⁱ⁺ʲ Mᵢⱼ
  • Adjoint: Adjoint is simply the transpose of the cofactor matrix. We represent it as (adj A). An important property of adj A is
Image by author

Let’s see an example for the last three properties

Example of above calculations. Image by author
  • Inverse of a matrix: Let the inverse of matrix A be matrix B. This implies that A x B = I. We can also represent this as A x A⁻¹ = A⁻¹ x A = I, where A⁻¹ is the inverse of A.
>>> a = np.array([[1, 2], [3, 4]])
>>> np.linalg.inv(a)
array([[-2. 1. ]
[ 1.5 -0.5]])

A formula through which we can calculate the inverse of a matrix is as follows, it comes from the property we saw above for adj A.

Image by Author

Note: Since here |A| is in the denominator, it tells us that for the inverse of a matrix to exist, the determinant of that matrix can not be equal to zero.

Eigenvalues and Eigenvectors

This concept provides a lot of intuition into many algorithms of ML, so I request that you understand it in depth. Eigenvalues corresponding to matrix A are scalar values that satisfy the characteristic equation of the matrix.

  1. Ax = λx, x is a vector
  2. | A — λI | = 0, I is the identity matrix.

Solving equation 2 gives us the eigenvalues of A and when we substitute each of these values in equation 1 we get different values of vector x which are known as eigenvectors.

import numpy as np
from numpy import linalg
input = np.array([[1,0,0],[0,2,0],[0,0,3]])
x, y= linalg.eig(input)
print(x) #eigenvalues
#array([1., 2., 3.])
print(y) #eigenvectors
#array([[1., 0., 0.],
# [0., 1., 0.],
# [0., 0., 1.]])

There are many complex concepts corresponding to eigenvectors and eigenvalues especially related to whether each of the values and vectors is unique and linearly dependent or not. We shall not cover them all here but let’s look into one important concept.

Diagnolizability

An n x n matrix A is said to be diagonalizable if it can be written as

A = P D P⁻¹

Here D is a diagonal n x n matrix with the eigenvalues of A as its entries and P is a nonsingular n x n matrix consisting of the eigenvectors corresponding to the eigenvalues in D.

The diagonalization theorem states that an n x n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors. Using numpy we can find the eigenvectors and check if they are linearly independent. In the last section, I have mentioned when these concepts come into play in ML.

Some special types of matrices

Let’s look at some special kinds of matrices that are referenced normally in linear algebra

Singular Matrix

Matrix A is a singular matrix if its determinant is equal to zero.

|A| = 0

The inverse of a singular matrix does not exist, as we saw above.

Symmetric Matrix

Matrix A is symmetric if A = Aᵗ. This implies that for every i, j, aᵢⱼ = aⱼᵢ, where aᵢⱼ is the element of matrix A at index (i,j).

Image by Author

Skew-symmetric Matrix

Matrix A is skew-symmetric if A = -Aᵗ. This implies that for every i, j, aᵢⱼ = -aⱼᵢ where aᵢⱼ is the element of matrix A at index (i,j). This implies that the diagonal elements are all zero.

Image by Author

Orthogonal Matrix

Matrix A is an orthogonal matrix if A Aᵗ = I. The determinant of an orthogonal matrix is either +1 or −1. All the eigenvalues of an orthogonal matrix are either +1 or −1.

Idempotent Matrix

Matrix A is said to be idempotent if A² = A. The determinant of an idempotent matrix is either 0 or 1. All the eigenvalues of an idempotent matrix are either 0 or 1.

Nilpotent Matrix

Matrix A is said to be nilpotent if Aᵐ = 0, where m is a positive integer.

Involutory Matrix

Matrix A is said to be involutory if A² = I.

Linear Algebra in Machine Learning

Earlier I mentioned that linear algebra helps you understand the concepts of machine learning and deep learning better. Since we have gone through many concepts, I would like to point out a few concepts/topics where you would see a direct application of linear algebra.

  • Datasets — The data is represented with the help of a matrix
  • Linear regression — This problem is solved using the matrix factorization method.
  • Principle Component Analysis —This problem also uses the matrix factorization method and eigenvector concepts.
  • Backpropagation — This is based on matrix multiplication

The above list is not exhaustive. I have a lot of other topics in mind which could be added to this list. But, this is to show you the importance of linear algebra.

Conclusion

Now, please take a moment to congratulate yourself on making it to the end. I hope this article provides you with sufficient information about linear algebra. To read more about different topics of data science, follow us on medium.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: