Original Source Here
Jargons in Deep Learning Explained
12 important key concept definitions related to Artificial Neural Networks and Deep Learning
Few Things to note
Deep learning has racked up an impressive collection of accomplishments in the past several years. In light of this, it’s important to keep a few things in mind, at least in my opinion:
- Deep learning is not a panacea — it is not an easy one-size-fits-all solution to every problem out there
- It is not the fabled master algorithm — deep learning will not displace all other machine learning algorithms and data science techniques, or, at the very least, it has not yet proven so
- Tempered expectations are necessary — while great strides have recently been made in all types of classification problems, notably computer vision and natural language processing, as well as reinforcement learning and other areas, contemporary deep learning does not scale to working on very complex problems such as “solve world peace”
- Deep learning and artificial intelligence are not synonymous
- Deep learning can provide an awful lot to data science in the form of additional processes and tools to help solve problems, and when observed in that light, deep learning is a very valuable addition to the data science landscape
Let’s get started with deep learning-related terminology definitions:
1. Deep Learning
- Deep learning is the process of applying deep neural network technologies to solve problems. Deep neural networks are neural networks with one hidden layer minimum
- Like data mining, deep learning refers to a process, which employs deep neural network architectures, which are particular types of machine learning algorithms.
2. Artificial Neural Networks
- The machine learning architecture was originally inspired by the biological brain (particularly the neuron) by which deep learning is carried out. Actually, artificial neural networks (ANNs) alone (the non-deep variety) have been around for a very long time, and have been able to solve certain types of problems historically.
- However, comparatively recently, neural network architectures were devised which included layers of hidden neurons (beyond simply the input and output layers), and this added level of complexity is what enables deep learning, and provides a more powerful set of problem-solving tools.
- ANNs vary in their architectures quite considerably, and therefore there is no definitive neural network definition. The 2 generally-cited characteristics of all ANNs are the possession of adaptive weight sets and the capability of approximating non-linear functions of the inputs to neurons.
- A perceptron is a simple linear binary classifier. Perceptrons take inputs and associated weights (representing relative input importance) and combine them to produce an output, which is then used for classification.
- Perceptrons have been around a long time, with early implementations dating back to the 1950s, the first of which were involved in early ANN implementations.
4. Multilayer Perceptron
- A multilayer perceptron (MLP) is the implementation of several fully adjacently-connected layers of perceptrons, forming a simple feedforward neural network
- This multilayer perceptron has the additional benefit of nonlinear activation functions, which single perceptrons do not possess.
5. Feedforward Neural Network
- Feedforward neural networks are the simplest form of neural network architecture, in which connections are non-cyclical.
- In the original artificial neural network, information in a feedforward network advances in a single direction from the input nodes, though any hidden layers, to the output nodes; no cycles are present.
- Feedforward networks differ from later, recurrent network architectures in which connections form a directed cycle.
6. Recurrent Neural Network
- In contrast to the above feedforward neural networks, the connections of recurrent neural networks form a directed cycle.
- This bidirectional flow allows for internal temporal state representation, which, in turn, allows sequence processing, and, of note, provides the necessary capabilities for recognizing speech and handwriting.
7. Activation Function
- In neural networks, the activation function produces the output decision boundaries by combining the network’s weighted inputs.
- Activation functions range from identity (linear) to sigmoid (logistic, or soft step) to hyperbolic (tangent) and beyond. To employ backpropagation (see below), the network must utilize activation functions that are differentiable.
- The back prop is just gradient descent on individual errors. You compare the predictions of the neural network with the desired output and then compute the gradient of the errors concerning the weights of the neural network. This gives you a direction in the parameter weight space in which the error would become smaller.
9. Cost Function
- The cost function measures the difference between actual and training outputs. A cost of zero between the actual and expected outputs would signify that the network has been training as would be possible; this would clearly be ideal.
10. Gradient Descent
- Gradient descent is an optimization algorithm used for finding the local minima of functions. While it does not guarantee a global minimum, gradient descent is especially useful for functions that are difficult to solve analytically for precise solutions, such as setting derivatives to zero and solving.
- In the context of neural networks, stochastic gradient descent is used to make informed adjustments to your network’s parameters with the goal of minimizing the cost function, thus bringing your network’s actual outputs closer and closer, iteratively, to the expected outputs during the course of training. This iterative minimization employs calculus, namely differentiation.
- After a training step, the network weights receive updates according to the gradient of the cost function and the network’s current weights, so that the next training step’s results may be a little closer to correct (as measured by a smaller cost function). Backpropagation (backward propagation of errors) is the method used to dole these updates out to the network
11. Vanishing Gradient Problem
- Backpropagation uses the chain rule to compute gradients (by differentiation), in that layers toward the “front” (input) of an n-layer neural network would have their small number of updated gradient values multiplied n times before having this settled value used as an update.
- This means that the gradient would decrease exponentially, a problem with larger values of n, and front layers would take increasingly more time to train effectively.
12. Long Short Term Memory Network
- A Long Short Term Memory Network (LSTM) is a recurrent neural network that is optimized for learning from and acting upon time-related data which may have undefined or unknown lengths of time between events of relevance.
- Their particular architecture allows for persistence, giving the ANN a “memory.” Recent breakthroughs in handwriting recognition and automatic speech recognition have benefited from LSTM networks.
Thanks for reading the article😊😊
If you want to read more articles like this, follow me
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot