Original Source Here
Computer Vision techniques are behind most AI applications we use daily, from the facial recognition capabilities in your smartphone to the incoming cashier-less retail stores, and let’s not forget everyone’s favourite car brand’s autonomous vehicle functionalities. It’s almost crazy to think solving Computer Vision was once a University’s student summer project back in the ’60s, or so the story goes.
Within the field of CV, there are many problems to be solved; the common ones are object detection, object recognition, pose estimation, gesture recognition, face detection, depth estimation etc. I won’t be delving into the details of common CV problems, But you can see there’s a lot to keep you busy within the field of CV.
Computer Vision and Deep Learning made GPUs (Graphical Processing Units) commonplace within Machine Learning.
GPUs are standard hardware amongst Machine Learning Researchers and Engineers, and this piece of hardware has made Nvidia one of the most valuable tech companies today.
GPUs are designed and built to speed up processing tasks such as graphics rendering and texture mapping. GPU’s critical attributes relevant to machine learning applications are their ability to simultaneously process and perform computations on data and their fast memory alterations.
Researchers use GPU for training and testing machine learning techniques to solve computer vision tasks such as visual processing and image recognition. In 2006, researchers at Microsoft published a paper introducing the use of GPUs to train and test convolutional neural networks for document processing. The researchers were inspired by the earlier work of D. Steinkraus et al.
The use of Convolutional Neural Networks(CNNs) for solving computer vision-related tasks was also a turning point for the machine learning field. CNNs leverage the mathematical convolution operation and use 2-D filters where the values of the filters are modified during training through backpropagation, a technique that enables learning within neural networks. CNNs are great for computer vision tasks such as image recognition and classification, but they are very slow to train on standard CPUs. Hence the exploration of training and testing of CNNs using GPU was undertaken by early ML researchers.
GPUs, Deep Learning, and highly accurate and performant models began to emerge year after year, especially after the introduction of AlexNet in 2012. AlexNet was trained on an NVIDIA GTX 580 3GB GPU and consisted of 8 layers, a combination of five conv layers and three fully connected layers. At the time, it achieved the state of the art performance at image classification.
The key takeaway here is, in an attempt to optimise the training and testing of neural networks to solve computer vision tasks, researchers explored the use of GPUs and different neural network architectures.
A Variety Of Applications
Another reason I find Computer Vision interesting is how widely applicable it is to different problems. There’s also a significant number of industrial application solutions leveraging CV techniques in some form or manner.
I never imagined working in a game studio that develops iOS mobile gaming apps, but it turns out a world of possibility opens up with the cameras on smartphones. Just think of an augmented and virtual reality.
I work with models built and trained to solve pose estimation, object detection, and gesture recognition. These models are 2–5MB(Megabytes); there are mp3 songs larger than these models that predict the location of 17 human joints in real-time. Simply amazing!
You have to admit the gap between research and commercial applications of machine learning models is closing, mainly due to the ongoing work of finding optimised methods of developing and delivering machine learning models. Optimised AI chips released by Intel, Apple and Nvidia, have provided platforms on which mobile optimised ML models can operate without computational restraints.
In terms of industrial relevance, you have access to a wide selection of industries to work in if you have computer vision expertise. Healthcare institutions require Computer Vision experts to develop algorithms to improve image processing of x-ray images and streamline medical imaging processes, from analysis to diagnosis. Defence and security agencies need Computer Vision experts to build detection and tracking algorithms. Car manufacturers and tech companies are hiring CV Engineers to help make autonomous vehicles a reality.
If you are outstanding and passionate about Computer Vision, the future relevance of your skills knows no limits(caveat: as long as you keep them up to date).
You’ll probably find most ML disciplines have a place in modern companies, especially now that society operates on technology and data.
It’s exciting to see machines with similar perceptions and scenic understanding capabilities that took nature several millions of years to develop.
The commercial markets and media attention are focused on efforts to introduce autonomous vehicles. Tesla is the front-running company in this space that fully leverage cameras as the only visual sensors within their fleets.
Tesla’s latest presentation shows their progress, such as deriving temporal data from video inputs and creating a novel neural network architecture that accepts input data in vector space instead of standard 2D vector representations images.
Big data and AI technology are becoming more prevalent in society. Simultaneously, consumers have made privacy a primary concern, especially when using applications connected to the internet. In a world where personal data contributes to one’s digital identity, privacy concerns become paramount. Computer Vision applications are at the forefront of AI regulation discussions. An example is face detection and recognition system use in public spaces.
As for me, I’m currently using computer vision solutions to develop a mobile application that monitors and recommend posture changes for remote and office workers in real-time. Another exciting project I’m involved in is using laptop cameras to track the eyes to detect literacy disability within readers. No doubt, these are exciting and interesting projects.
If you want to know more about computer vision and deep learning, you can learn with me at my upcoming O’Reilly live training session.
Want more from me?
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot