Original Source Here

# Top 5 Must-Know Math Concepts for Deep Learning

## One is a Must-Learn

The most important mathematics concepts and methods for efficient, accurate, and optimized Deep Learning implementations and use cases

## Like circling vultures, you cannot escape those interview questions about mathematics.

I recently wrote about what to major in to break into the artificial intelligence industry. If I could do it all over again, I would have gone straight mathematics. Not specifically statistics. Not computer science. Not engineering. (I took as many courses as I could in artificial intelligence for my master’s degree from Harvard University — see my bio.)

Mathematics is requisite for deep learning, indispensable for even the simplest implementations forasmuch as it is the organic feature for how to understand and represent data. If you cannot receive the high level details in mathematics for learning, you will be blinded by the light of all the brilliance that will surround you any development or implementation role.

**To get into the beyond-the-pale professional crafts in artificial intelligence and integrate across head-turning work, start with mathematics (whether it is to major in it, attain certificates in it, or receive hands-on applied training by applying it).**

Without mathematics, deep learning would be probably kept within the bounds of rudimentary pattern recognition use cases. The true illustration of applying mathematics to deep learning use cases is for building models that can learn complex relationships between variables.

As an illustration, consider if you want to build a model that can predict the price of a house based on its size. If we only had data about houses in one city, then a simple linear model could probably do a decent job at prediction. But what if we wanted our model [20] to work for any city? In this case, we would need some way of understanding how different features (e.g., size) relate to each other to make robust predictions. This is where mathematics comes in: through the methods (and effectively their applications) of mathematical functions, we can encode these relationships into our models to generalize beyond individual data points.

# In addition to providing ways of representing data, mathematics also allows us to optimize our models to perform better with less training data.

Deep learning algorithms are very complex and require large amounts of training data to work effectively; however, by using optimization methods from mathematics (such as gradient descent [1]), we can find solutions that are close enough to the global optimum with much less training data than brute-force search methods, an overall approach that not only saves time and resources but also makes it possible to train neural networks on personal devices that otherwise wouldn’t have enough computational power or memory capacity.

**Brief Introduction to Deep Learning**

**Mathematics is the language of deep learning.**

It allows us to describe problems in a way that computers can understand and provides us with the tools we need to solve those problems.

Deep learning is a branch of machine learning that uses algorithms to learn from data in ways that are too complicated for humans to do manually — the goal is to find patterns in data and use them to make predictions or decisions.

To achieve this goal, deep learning algorithms build models, which are mathematical equations that represent the relationships between variables in the data. These models can be applied to make predictions about new data sets, or they can be used to control systems (think of robotics engineering or self-driving cars).

**To build these models, deep learning algorithms need large amounts of data.**

This data is typically processed by CPUs (central processing units) or GPUs (graphics processing units). However, recent advances in hardware and software have made it possible for some deep learning tasks to be run on devices such as smartphones.

Regarding those large datasets and their associated models, machine learning algorithms can detect patterns in data (typical use cases may include facial recognition or predicting consumer behavior).

**Deep learning is effectively then a subfield of machine learning that uses sophisticated algorithms to learn from massive datasets. To train deep learning models, one must have a strong understanding of mathematics.**

Most of the deep learning research is based on linear algebra and calculus. Linear algebra is used for vector arithmetic and manipulations, which are at the intersection of many machine learning techniques. Calculus is used for optimization; since most machine learning algorithms are optimization problems, calculus is essential for training accurate models. Additionally, probability and statistics play a major role in deep learning. Probability helps us understand how likely it is that certain events will occur, while statistics allow us to make inferences about population parameters based on [21] sample data. Together, these two fields help us reason about uncertainty, a matter that is fundamental to machine learning implementations.

**Top 5 Most Important Mathematics Methods/Approaches**

Here is your stairway to the stars — master these in accordance with your professional craft, and you can paddle for your own canoe for millions of seconds (there are 1.105e+9 seconds in a 35-year career ).

I argue the following five:

# Linear algebra *(the must-learn)*, CNN architectures, approximation inferences, optimization methods, and deep learning theory.

1. **Linear algebra** is the study of mathematical problems explained in terms of linear equations.

## I use linear algebra to organize and represent data in a way that makes sense mathematically.

As an illustration, when using regression to predict housing prices, one can think of each house as a point in a dimensional space based on a number of features (e.g., square footage). The goal then becomes finding a line that best fits all the points (houses). This problem can be solved using linear algebra by solving a set of simultaneous equations. Once the line has been found, one can use it to make predictions about new houses (points) that have not been seen before.

2. **Optimization Methods** are algorithms used to find the best solution to a given problem from among all possible solutions. For machine learning, I use optimization methods to find the parameters (weights) of models such as neural networks that will result in low error rates on training data sets while also generalizing to unseen data sets. Varied optimization algorithms exist, including Gradient Descent and its variants (e.g., stochastic gradient descent [2]), conjugate gradient descent, Newton’s Method [3], and more specialized algorithms like Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm or LBFGS [4]. which ones work best depends on factors such as the type of problem being optimized and computational resources available.

# A good understanding of how different optimization methods work is critical for effective debugging machine Learning systems since poor performance often indicates that the wrong algorithm or inefficient implementation was used.

You can continue research into deriving ever more efficient methods with stronger theoretical guarantees.

3. **CNN Architectures** are a type of neural network [7] designed to effectively detect patterns in images [8].

## The design of these networks is based on the structure of the visual cortex [9], which is arranged in hierarchical layers.

CNNs start with a low-level layer that detects simple features such as edges and gradually build up to higher-level layers that detect more complex features such as objects. This hierarchical structure enables CNNs to learn rich representations of data efficiently; I have been successful implementing tasks such as image classification and object detection.

4. **Approximate Inference** is a method of solving problems in which exact solutions are not possible or arrive with degrees of complexity [10] to compute. Approximate inference algorithms aim to find solutions that are close enough to the true solution for the desired purpose [5][6].

## Namely, consider a problem where you need to estimate the probability of an event happening based on data from a sample population.

One way to do this would be to enumerate all the possible worlds (all combinations of values for each variable) and then use these results along with prior information about the variables’ distributions to calculate posterior probabilities [11]. However, this approach quickly becomes impractical as population size increases because there are exponentially many ways in which events could occur. Hence, various approximation methods have been developed such as Monte Carlo simulation and variational Bayes that allow for efficiently computing estimates while still providing reasonable accuracy guarantees.

5. **Deep learning theory** is the branch of machine learning that deals with the design and analysis of algorithms for learning in situations where data is represented in a high-dimensional space [12].

# Deep Learning theory has been heavily influenced by work in artificial neural networks and biology, particularly neuroscience [13][14]. One of the key differences between shallow [15] machine learning models (e.g., linear regression) and deep learning models are shallow models learn simple representations of data while deep learning models can learn richer representations.

As an example, a deep convolutional neural network can learn to detect objects in images by having many layers that each extract increasingly complex features from image pixels; in contrast, a shallower model can learn about low-level features such as edges or color blobs [16]. The ability to learn rich feature representation [17] is what makes deep learning so powerful and enables applications, like applying to a wide range of tasks such as image classification, object detection, and natural language processing.

Deep learning algorithms are often implemented using neural networks, which are models that simulate the workings of the brain [18]. Neural networks consist of interconnected units (neurons) that can process information in parallel. Each neuron takes input from some number of other neurons (its inputs), performs a computation on these inputs, and then passes its output to a few other neurons (its outputs). The outputs of all neurons in the network are combined to produce the results produced by the network. Neural networks have been found to be particularly well-suited for deep learning because they can learn rich feature representations by gradually adding more layers specialized for different tasks (this is known as hierarchical modeling [19]).

**Parting Thoughts**

Without mathematics, there would be no deep learning as we know it today; it’s an essential tool for both building accurate models and making efficient use of scarce resources like time and computing power.

*I will provide a link to a linear algebra (my recommended must-learn) article I recently posted, with which I deep a deeper dive into the field for natural language processing and machine learning use cases.*

Please share your thoughts with me if you have any edits/revisions to recommend or recommendations on further expanding this topic.

**Also, please kindly consider subscribing to my weekly newsletter:**

*References:*

*1. Daoud, E. A. (n.d.). Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. International Journal of Computer and Information Engineering, 13(1), 6–10.*

*2. Jacobian calculation using the multidimensional fast Fourier transform in the harmonic balance analysis of nonlinear circuits. (n.d.). IEEE Xplore. Retrieved July 25, 2022, from **https://ieeexplore.ieee.org/abstract/document/52585*

*3. The theory of Newton’s method. (n.d.). Journal of Computational and Applied Mathematics, 124(1–2), 25–44. **https://doi.org/10.1016/S0377-0427(00)00435-0*

*4. Saputro, D. R. S., & Widyaningsih, P. (n.d.). Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method for the parameter estimation on geographically weighted ordinal logistic regression model (GWOLR). AIP Conference Proceedings, 1868(1). **https://doi.org/10.1063/1.4995124*

*5. Clayton et al. Approximate Inference in Generalized Linear Mixed Models. **https://www.tandfonline.com/doi/abs/10.1080/01621459.1993.10594284*

*6. Rezende, D. J., Mohamed, S., & Wierstra, D. (2014, June 18). Stochastic backpropagation and approximate inference in deep generative models. PMLR. **https://proceedings.mlr.press/v32/rezende14.html*

*7. Completely automated CNN architecture design based on blocks. (n.d.). IEEE Xplore. Retrieved July 26, 2022, from **https://ieeexplore.ieee.org/abstract/document/8742788*

*8. Ma, N., Zhang, X., Zheng, H.-T., & Sun, J. (n.d.). ECCV 2018 open access repository. Retrieved July 26, 2022, from **https://openaccess.thecvf.com/content_ECCV_2018/html/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.html*

*9. Tripp. (2019). Approximating the architecture of visual cortex in a convolutional network. Neural Computation, 31(8), 1551–1591. **https://doi.org/10.1162/neco_a_01211*

*10. Opper et al. Expectation Consistent Approximate Inference. **https://www.jmlr.org/papers/volume6/opper05a/opper05a.pdf*

*11. Murphy, K., Weiss, Y., & Jordan, M. I. (2013, January 23). Loopy belief propagation for approximate inference: An empirical study. ArXiv.Org. **https://arxiv.org/abs/1301.6725*

*12. Dube, S. (2018, January 2). High dimensional spaces, deep learning and adversarial examples. ArXiv.Org. **https://arxiv.org/abs/1801.00634*

*13. Richards, Lillicrap, Beaudoin, Bengio, Bogacz, Christensen, Clopath, Costa, Berker, de, Ganguli, Gillon, Hafner, Kepecs, Kriegeskorte, Latham, Lindsay, Miller, Naud, Pack, … Kording. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 1761–1770. **https://doi.org/10.1038/s41593-019-0520-2*

*14. Storrs, K. R., & Kriegeskorte, N. (2019, March 4). Deep learning for cognitive neuroscience. ArXiv.Org. **https://arxiv.org/abs/1903.01458*

*15. Xu, Y., Zhou, Y., Sekula, P., & Ding, L. (2021). Machine learning in construction: From shallow to deep learning. Developments in the Built Environment, 6, 100045. **https://doi.org/10.1016/j.dibe.2021.100045*

*16. a Abdalla, A., Cen, H., Wan, L., Rashid, R., Weng, H., Zhou, W., & He, Y. (2019). Fine-tuning convolutional neural network with transfer learning for semantic segmentation of ground-level oilseed rape images in a field with high weed pressure. Computers and Electronics in Agriculture, 167, 105091. **https://doi.org/10.1016/j.compag.2019.105091*

*17. Rezende, M. C. ; T. P. ; S. (n.d.). A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set. **https://aclanthology.org/N13-2003.pdf*

*18. Akkus, Galimzianova, Hoogi, Rubin, & Erickson. (2017). Deep learning for brain MRI segmentation: State of the art and future directions. Journal of Digital Imaging, 30(4), 449–459. **https://doi.org/10.1007/s10278-017-9983-4*

*19. Deng, L. (n.d.). A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3. **https://doi.org/10.1017/atsip.2013.9*

*20. Advanced Geoprocessing — ENV 859Geo Data Analytics. **https://env859.github.io/modeling/overviewNULL.html*

*21. DAFM CIA 3.pdf — Course Hero. **https://www.coursehero.com/file/126091115/DAFM-CIA-3pdf/*

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot