Lasso (l1) and Ridge (l2) Regularization Techniques

Original Source Here

Lasso (l1) and Ridge (l2) Regularization Techniques

Techniques used to reduce over-fitting

A photo by Author

Introduction of Ridge and Lasso Regression

What is the need for Ridge and Lasso Regression?

When we create our linear model with the best-fitted line and come on testing phase then because of increased variation, our model is over-fitted, So It will not work well in the future also not provide appropriate accuracy. Therefore, to reduce overfitting, ridge and lasso regression came into the picture. Both are powerful techniques with a slight difference used for creating such models that are efficient and computationally fit to reduce over-fitting.


It is a process to classify the classes and provide additional information to prevent over-fitting. Linear regression is a well-known standard method for regression that suppose a linear relationship between the inputs variable and the target variable. It is the add-on of the linear regression which involves adding penalties to the loss function during training. Therefore, it is referred to as Regularized Linear Regression.

In other words, it is a method or technique used to reduce over-fitting so that we can make our model prediction appropriately.

Regularization gives two techniques L1 (Lasso regression) and L2 (Ridge regression).

Ridge Regression :

Linear regression’s regularized version is called Ridge regression. It is used to fit the data and keep the weights in small sizes so that the training process will go with ease.

It performs L2 regularization means it add penalty equivalent to the square of the magnitude of coefficients that can be given by the formula as given below :

The Equation for the ridge regression given by:

where y is the target variable and x1,x2…xk is the predictor variables.

λ(slope) ² is the penalty term where Lambda is the degree of deflection.

The deflection from the simple regression fitted line by restricting the coefficient of predictor variables but it will never make them zero.

Assume an example of a very popular data set salary-experience showing ridge regression plot using lambda=100.

Note: As alpha increases magnitude of the coefficient reduces to 0 but not 0.

The independent variable moves towards 0.

Advantages of Ridge Regression

  • Complexity reduce of our model that has a big number of coefficient
  • Computational expensiveness is also reduced.
  • It works well in presence of highly correlated features.


Model Interpret-ability means it will shrink the coefficients very close to zero but not exactly zero.

Lasso Regression (L1)

Lasso Regression is a shrinkage type version of linear regression in which data points are shrunk towards a central point. It is used for such models that show high levels of multi-collinearity. Lasso regression works by automating certain parts of model selection i.e feature extraction.

Lasso full form is the “least absolute shrinkage and selection operator”.

The equation for lasso regression is:

where y is the target variable and x1,x2…xk is the predictor variables.

λ(slope) is the penalty term where Lambda is the degree of deflection.

Lasso regression can also eliminate the variables by making their coefficients to zero thus removing the variables that have high covariance with other predictor variables. Lasso regression is different from ridge regression only in the penalty term.

Assume an example of a very popular data set salary-experience showing ridge regression plot using lambda=10000.

Advantage of Lasso Regression

  • Reduced model complexity i.e. over-fitting.
  • Also acts as Feature selection by making highly correlated feature to zero.
  • Hence Computational power is also reduced to make it an appropriate model.


(i) It can not perform group selection.

(ii) Small values of alpha give significant sparsity.

Above plot showing a comparison of all three regression.

This is the code for both regressions which is implemented in Google colab on the data set available housing data set.

import pandas as pdBHNames= [‘crim’,’zn’,’indus’,’chas’,’nox’,’rm’,‘age’,’dis’,’rad’,’tax’,’ptratio’,’black’,’lstat’,’medv’]url=’'data = pd.read_csv(url, delim_whitespace=True, names=BHNames)print(data.head(20))
from sklearn.model_selection import train_test_splitX = data.drop(‘medv’, axis = 1)print(‘X shape = ‘,X.shape)#output:
X shape = (506, 13)
Y = data[‘medv’]print(‘Y shape = ‘,Y.shape)#output:
Y shape = (506,)
from sklearn import linear_model
import matplotlib.pyplot as plt
names = data.drop(‘medv’, axis =1).columns

Lasso Regression will start work from this section

lasso = linear_model.Lasso(alpha=0.2)lasso_coef =,Y).coef_plt.plot(range(len(names)),lasso_coef )plt.xticks(range(len(names)), names, rotation=60)plt.ylabel(“coefficient”)

Ridge Regression will start working from here.

#For Ridgefrom sklearn.preprocessing import StandardScalerscaler = StandardScaler()x_std = scaler.fit_transform(X)from sklearn.linear_model import Ridgeridge = Ridge(alpha=0.2)model =,Y).coef_model


This article gives a description of Ridge and Lasso regression which are the powerful techniques of regularization which make our model with appropriate prediction.

I hope you like the article. Reach me on my LinkedIn and twitter.

Recommended Articles

1. NLP — Zero to Hero with Python
2. Python Data Structures Data-types and Objects
3. Exception Handling Concepts in Python
4. Why LSTM more useful than RNN in Deep Learning?
5. Neural Networks: The Rise of Recurrent Neural Networks
6. Fully Explained Linear Regression with Python
7. Fully Explained Logistic Regression with Python
8. Differences Between concat(), merge() and join() with Python
9. Data Wrangling With Python — Part 1
10. Confusion Matrix in Machine Learning


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: