Why Not Use CNN to Extract Features?

Original Source Here

Why Not Use CNN to Extract Features?

How to find unexpected patterns in your data

latent space graph example (image by author)

There is beauty in the unexpected.

Just when you think you have everything figured out, something new pops up and throws you for a loop. The same can be said for data analysis. When you are looking at your data set, trying to find patterns and trends, sometimes you will come across something that doesn’t quite make sense. This is where anomaly detection comes in.

Anomaly detection is the process of identifying unusual patterns in your data. These unusual patterns can be anything that doesn’t fit the normal trend or behavior, and they can be caused by a variety of things such as errors in data collection, outliers, or even malicious activity. Anomaly detection is important because it can help you find problems with your data that you wouldn’t be able to find otherwise.

There are a variety of methods for anomaly detection, but in this blog post, we will focus on one particular method: manifold learning. Manifold learning is a technique for finding low-dimensional representations of high-dimensional data.


An auto-encoder is a type of artificial neural network divided into two main elements: the encoder network and the decoder network.

Each part performs the following:

  1. The encoder network: reduces a high dimensional input in a low dimensional space called latent space.
  2. The decoder network: maps the latent space into a representation of the input pictures.

The auto-encoder falls in the category of non-supervise learning techniques as the data don’t need to have labels. The encoder reduces the dimensionality of the input data, the decoder reproduces the input from the latent space and the two networks are optimized in order to reduce the difference between the input and output data.

The encoder and decoder networks can be designed to serve specific tasks. In the case of pictures, we typically use convolutional neural networks (CNN) that we train in order to reduce the mean square error (MSE) between the input X and its reconstructed output X’, i.e.

Mean square error between the input X and its reconstructed output X’

Popular use cases of the autoencoders are:

  • Dimensional reduction
  • Image compression
  • Data denoising
  • Anomaly detection

For the latter, classical methods focus on spotting anomalies by looking at the difference between the input and its reconstructed version. The assumption is that the auto-encoder performs well when the input is similar to the training dataset but produces high reconstruction errors around anomalies. To use this method, we train the auto-encoder with anomaly-free data and look at the difference between the input and output of the auto-encoder.

Another possibility is to ensure that the model learns a meaningful representation of the latent space and spots anomalies directly in this lower-dimensional space. This is where the Laplacian Auto-Encoder comes in.

But first, we have to build a K-Nearest-Neighbor Graph in order to train the Laplacian Auto-Encoder.

K-Nearest-Neighbor Graph

The k-nearest neighbor graph (k-NNG) is a graph where each node is connected to its nearest neighbors. For example in the graph below, each node is connected to its three closest points.

k-NN graph sketch (image by author)

The Euclidian norm is probably the most intuitive measure of closeness as it gives the shortest distance between two points. Other popular distance metrics such as Minkowski, Manhattan, Hamming, or Cosine can be chosen depending on the application.

For high dimensional input, such as pictures, we need to choose a distance metric that can measure similarities between images such as structural similarity index measure (SSIM) or based on a histogram of oriented gradient (HOG).

For instance, we can use the HOG descriptor and compute the distance between two histograms with the Wasserstein metric or the chi-squared distance.

When selecting a metric, we need to keep in mind that a good distance measure should be:

  • Informative: the distance can directly be translated as a level of similarity
  • Reflexive: distances from “A to B” and “B to A” are equal
  • Sensitive: changes are smooth when the distance changes
  • Bounded: the metric falls within a restricted range

Laplacian Auto-Encoder

The biggest challenge when working with autoencoders is to ensure that the model actually learns a meaningful representation of the latent space.

The Laplacian Auto-Encoder also uses the encoder-decoder structure but the difference lies in the loss function used to train the two networks.

The auto-encoder is still trained in order to reduce the error between the input and its reconstructed output but a regularization term is added in the loss function in order to keep the same neighbors between the high and low dimensions. Meaning that close data points in the input space will stay close in the latent space.

To build a Laplacian Auto-Encoder, we first have to build the KNN graph on the input data and add a regularization term on the loss function which encourages the preservation of the same neighbors once they are mapped in the latent space.

From the KNN graph, we derive a weight matrix (W) that has a small W(i, j) when the distance between Xi and Xj is large, and large when the distance between Xi and Xj is small. A regularization function is then defined as follows:

regularization function of the Laplacian Auto-encoder

Where Zi and Zj characterize the mapped points in the latent space from inputs Xi and Xj, respectively. The first parameter (lambda) is the regularization weight that we can tune as a hyperparameter of our model.

The full loss function on which the Laplacian Auto-Encoder aims to optimize is then defined as follows:

loss function of the Laplacian Auto-encoder

Final Words

So, what can you do when your data doesn’t quite fit the mold?

Use a Convolutional Neural Network (CNN) to find unexpected patterns in your data. CNN is great for extracting features from images and has been shown to be very effective at finding patterns that are difficult to detect with traditional methods.

The main advantage to use unsupervised methods over supervised equivalent is that we do not need to label any data, a task that can be very expensive. The trade-off is that we might detect patterns that are not anomalies, but rather intrinsic to the dataset.

I hope you enjoyed this tutorial and found it useful. If you have any questions or comments, feel free to post them below.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: