Original Source Here
Weaving Individualized AI into Everyday Life
Artificial intelligence is transforming the world through language translation, face recognition, objection detection, and many other areas. These AI systems often fall into broad categories that have worldwide market demand and therefore attract intensive research. What many people might not know is that we can leverage the achievements of deep learning to create customized AI services that are fashioned to suit individual people.
This Medium article, along with its corresponding GitHub project, shows how we can create a personalized alert system by piggybacking off the success of established object recognition neural networks. This project demonstrates how pretrained neural networks can detect when an object of interest changes state — in this instance, when a gate opens or when the leg rest of a reclining chair folds down.
Neural Networks for Images
VGG-19 is a pretrained image recognition neural network from the Visual Geometry Group at the University of Oxford. This neural network is popular for a variety of tasks because of its straightforward architecture and its success at parsing image features. It consists of 16 convolutional layers, which interpret the low-level attributes of an image, followed by three fully connected layers that try to determine which object is in a picture.
Since we are looking for simple changes across images, we need an evaluation of the characteristics of an image rather than an assessment of which object is in a picture. This means we can strip off the last three layers — often called the “top” of the neural network — and capture the output of the last convolutional layer for each image. This is called feature extraction.
Then we will use a distance metric to determine how far an image has moved off its baseline state. There are several distance measures we could use to compare images, but the cosine similarity is a good starting point. The VGG-19 model uses the ReLU activation function, and this means that many of the feature numbers have a zero value. Therefore, the most basic distance measure, Euclidian distance, might not give good results, and cosine similarity is often used for sparse matrices.
To detect movement in an object, we will run a collection of images through a pretrained VGG-19 neural network and extract a numerical vector of features from a convolutional layer. By comparing images to the baseline, we can tell when (a) an object is moving into a new position, or (b) something else is interfering with the results, such as a change of sunlight or another object moving through the image.
TensorFlow + Python
This project uses TensorFlow and Python to evaluate the sample images, and the code is located at this GitHub repository. The practice of feature extraction is so important that the Keras functionality of TensorFlow makes it easy to import convolutional neural networks such as VGG-19, Inception, or ResNet.
François Chollet has published the parameter files for several pretrained neural networks. These weight files include the option to download them without the aforementioned “top” of fully connected layers. This is because convolutional layers can be much more efficient than fully connected layers in their use of parameters. In this case, excluding the top of the VGG-19 model reduces the file size from 548 to 76 megabytes.
Many people would find it useful to be notified when a backyard gate is open because this could mean that a child or dog could leave the yard. Our first example consists of nine images of a gate moving from a closed position into a fully open position, and there are separate images for daytime and nighttime.
The first image is a picture of the closed gate. This is the baseline image, and other images are evaluated by their cosine similarity to this first picture. In the ideal case, these cosine metrics should get further away from the reference image as the gate opens.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot