Neural Networks : More than deep learning

https://miro.medium.com/max/1200/0*yDttUTBvZMBMrWQL

Original Source Here

Photo by eberhard 🖐 grossgasteiger on Unsplash

Motivation

The amount of research produced each year by the Computational Intelligence (CI) community is astounding. One specific branch of CI is the ever-popular branch of Neural Networks. Neural Networks hit the mainstream because of their performance and the fascination to see how deep we could make the architectures. Deep learning seems to be synonymous with “Artificial Intelligence” these days, which has driven their acceptance. I was first exposed to deep learning when I was working on my master’s thesis. I was trying to classify a multi-dimensional acoustic signal (an image created from recording a sound when bounced off an area of interest). At that time, AlexNet was the cream of the crop; naturally, I implemented this Convolutional Neural Network (CNN) to evaluate my data. To my surprise, it performed very well. This experience opened my eyes to the power of these black boxes. However, it wasn’t until a few years later that I learned Neural Networks are much more than deep classification/regression algorithm. The IEEE Computational Society defines this branch as “Using the human brain as a source of inspiration, artificial neural networks (NNs) are massively parallel distributed networks that can learn and generalize from examples. This area of research includes feedforward NNs, recurrent NNs, self-organizing NNs, deep learning, convolutional neural networks, and so on.” Did you know that there is more to the field of Neural Networks than just deep architectures? I was excited to learn about a few other areas, and I’ll share them in this post.

Convolutional Neural Networks

Photo by Joe Caione on Unsplash

Convolutional Neural Networks (CNNs)(arguably the most popular type of NN) are neural networks whose primary focus is on image processing. When describing a NN, most will give the example of classifying an image as either a cat or a dog. A CNN is precisely the type of NN that performs this task. To break down a CNN, let’s consider three different aspects of a CNN — Filters, Architecture, and Optimization Function.

Filter
For the sake of this article, we’ll consider filters to be the building blocks of CNNs. Each filter is a set of weights optimized during the training phase. The goal is to learn a set of weights that represent the types of objects we’re trying to classify. For example, the first layer(s) of filters may learn edges, and as we move further into the network, these filters are combined to form more complex shapes. However, this isn’t exactly what happens because we aren’t optimizing to learn shapes — we’re optimizing for accuracy. Though it is possible to optimize shapes, check out a paper we wrote for Morphological Shared Neural Networks [1].

Architecture
If filters are the building blocks, the architecture is what we build out of them. An architecture is defined by its structure which can be deep (many layers of filters) and wide (many filters in each layer). There are no perfect architectures which are why there are so many. One of my specific areas of research is fusing CNNs for image classification [2]. We fused seven networks, and we found that fusing the networks improved performance — suggesting multiple architectures is better. Many factors constrain which architecture is best for your situation. For example, where will you deploy your CNN? on your phone? in the cloud? A deeper network requires more computational power, so it is essential to consider these factors when using (or building) an architecture.

Optimization Function
The optimization function drives the training of the CNN. For image classification, its goal may be to classify as many images as possible correctly. However, suppose the CNN is trying to perform object localization (find an object in an image). In that case, the optimization function will try to find and place a box around an object (and only that object). We’re only limited in our creativity as to what the optimization functions can do.

Illustration highlighting three components of a CNN. Illustration by me 🙂 – The photo of the joyful dog if from Joe Caione on Unsplash

CNNs open the door for solutions to a broad set of problems. Open-source packages like TensorFlow and Pytorch promote the use of not only CNNs but any NN. With their ease of use, I do feel we should encourage using these tools responsibly. It’s possible to stand up a custom architecture with a custom optimization function within a matter of an hour (I’ve done it several times). We should challenge the notion that high accuracy is good enough to deploy a model. As practitioners of these technologies, I believe we’re doing better every day, but we should continue to strive for excellence — especially when we leave much to data.

Application
As always, I’ll leave you with an example. A few years ago, I worked with a company that performed debris monitoring services. Monitors ensured proper load estimations by analyzing hundreds of thousands of images a year. As a result, I used a CNN to estimate the amount (regression) and type (classification) of debris in each load. This task didn’t require creating a custom architecture; however, I iterated over several off-the-shelf architectures to find the one that performed the best. This project took a couple of weeks because the data required cleaning. There were exceptions in the data that required removing to ensure we weren’t training on flawed images. I did alter the optimization function for this use case. Accuracy scores weren’t quite enough for this application. A significant factor to the client was if a load call was wrong, it wasn’t off by much. For example, if the truck was FULL, we NEVER wanted to call it EMPTY. So, I appended to the optimization function the max absolute error. This update works to minimize how far off the regressed values are. Our CNN may get a few more images wrong, but it works to create a bound by how wrong they can be.

Self-Organizing Feature Maps

Photo by Marjan Blan | @marjanblan on Unsplash

If you Google “Neural Networks,” it’ll find about 162,000,000 results. Very few of them are Self-Organizing Feature Maps (SOFMs). SOFMs are relatively different from most Neural Networks you hear about today — both in application and setup. SOFMs drove the creation of other algorithms like Neural Gas, but we’ll start with the basics in this post.

SOFMs are an unsupervised algorithm — this means we don’t have a label (think cat or dog). Most unsupervised algorithms perform either clustering or dimensionality reduction. Dimensionality reduction happens when we have many dimensions of something, and we need to reduce it. This is a rough example but consider a cube. Its width, height, and depth are one way to define it. However, if we multiply each of these features together, we calculate its volume. We’ve gone from three descriptions of an object to one, so we’ve reduced its data dimensionality from three to one.

Unfortunately, data doesn’t always work out this clean. Most of the time, when we reduce from higher dimensional spaces, we lose information by reducing to a few dimensions. Similarly, if the cube were blue, we wouldn’t represent this with a value for its volume. As a result, an entire bank of algorithms attempts to reduce the dimensionality of data. In the absence of a perfect reduction tool, many algorithms exist. The NN community developed SOFMs to combat this problem.

To see an algorithmic breakdown of SOFMs, check out this article [3] by Abhinav Ralhan.

Application

My favorite use of SOFMs is their ability to visualize high-dimensional features vectors in a two-dimensional space. SOFMs have two phases — training and mapping. During training, a randomly generated “map” of neurons begins the training process. Over each iteration, we supply the training data to the map, and it updates/organizes itself by grouping neurons with similar outputs. The goal is that groups/clusters of similar neurons will separate themselves from other groups. If successful, we will be able to see different clusters in a two-dimensional image. During mapping, new samples (from the high-dimension) are classified into which cluster they belong. A very, simple example involves pixel colors. Pixels can be represented with three values — R, B, and B. We can use the SOFM algorithm to cluster these pixels. In this example, we know the colors will cluster together, but if we didn’t know anything about colors, we would know their similarity after viewing the map. An example of what we could expect is in the next image.

In this example, we learn a feature map from many pixel values. Illustration by me:)

Unsupervised methods can be challenging to connect to an application because they are typically used to explore a data set to understand it better. In a time of big data, we have samples that have potentially thousands of features, and a data scientist can’t digest that amount of information. So, unsupervised techniques will become crucial as we are collecting even more data and need to understand it quickly. SOFMs may hold the keys to better insights on high-dimensional data.

Extreme Learning Machines

Photo by Web Donut on Unsplash

Extreme Learning Machines (ELMs) are a lesser-known type of NN. The theory behind ELMs differs from other NNs in their training. Remember, NNs typically use backpropagation to learn ALL of the weights in a NN.

However, in ELMs, only one set of weights is learned during training. The weights in the layers are fixed and randomly initialized, and the last layer of weights are the only ones that are learned. The randomly initialized weights randomly transform the data as it propagates through the network. The final layer in the network ultimately turns into a linear equation that needs to be solved. A few math tricks are necessary to make this tool work, and here is a pretty good description [4].

Usually, I highlight some type of application, but you’ve probably seen a hundred supervised, classification algorithms by now. Instead, here’s a fun fact about ELMs. ELMs have drama surrounding them. Drama doesn’t surface on this level very often in academic papers, but it did with ELMs. The academic community thrives on new methods rooted in solid mathematics. Unfortunately, ELMs may not fit the bill. ELMs received a backlash because the idea of ELMs was not new, and the authors of [5] wanted to be sure to assign credit to the appropriate creators correctly. They claim that ELMs are other methods, and they do not need a new name — Extreme Learning Machines.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: