Convolutional Neural Networks (CNNs) Dog Breed Classifier



Original Source Here

Step 1: Detect Humans

With both human and dog datasets ready, now I will create two functions to use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on github. We have downloaded one of these detectors and stored it in the haarcascades directory.

import cv2                
import matplotlib.pyplot as plt
%matplotlib inline

# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')

# load color (BGR) image
img = cv2.imread(human_files[3])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# find faces in image
faces = face_cascade.detectMultiScale(gray)

# print number of faces detected in the image
print('Number of faces detected:', len(faces))

# get bounding box for each detected face
for (x,y,w,h) in faces:
# add bounding box to color image
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)

# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()

The output image shows as follows:

Before using any of the face detectors, it is standard procedure to convert the images to grayscale. The detectMultiScale function executes the classifier stored in face_cascade and takes the grayscale image as a parameter. Now I will use this function to detect whether a human face is in a image, returning True if there is and False otherwise.

Here is the function I create to detect human face:

# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
return len(faces) > 0

However, before implementation, let’s test how well this detector works by passing 100 images into it. The test shows that it achieves 100% accuracy in detecting a True human face among human image files while it falsely identifies 11% of the dog data as human faces. The test data and outputs are shown below:

human_files_short = human_files[:100]
dog_files_short = train_files[:100]

# Test the performance of the face_detector algorithm on the images in human_files_short and dog_files_short.
human_face_count = 0
dog_as_face = 0
for i in human_files_short:
human_face_count += face_detector(i)

for j in dog_files_short:
dog_as_face += face_detector(j)
# print out results from face_detector function
print(f'{int(human_face_count/len(human_files_short)*100)}% of first 100 images in human_files_short have a detected human face')
print(f'{int(dog_as_face/len(dog_files_short)*100)}% of first 100 images in dog_files_short have a detected human face')

Result output:

100% of first 100 images in human_files_short have a detected human face
11% of first 100 images in dog_files_short have a detected human face

Step 2: Detect Dogs

In this step, I will use a pre-trained ResNet-50 model to detect dogs in images. ResNet-50 model, along with weights that have been trained on ImageNet, is a very large, very popular dataset used for image classification and other vision tasks. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

Step 2.1 Setup

from keras.applications.resnet50 import ResNet50

# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

Step 2.2 Pre-process data

Why pre-processing required?

When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape (shown below):

(nb_samples, rows, columns, channels)

where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

from keras.preprocessing import image                  
from tqdm import tqdm

def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)

The path_to_tensor function above takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape (1, 224, 224, 3).

Let’s make some predictions with ResNet-50!

from keras.applications.resnet50 import preprocess_input, decode_predictions

def ResNet50_predict_labels(img_path):
# returns prediction vector for image located at img_path
img = preprocess_input(path_to_tensor(img_path))
return np.argmax(ResNet50_model.predict(img))

With the above function defined, I then create a dog_detector function to predict whether the image is a dog:

### returns “True” if a dog is detected in the image stored at img_path
def dog_detector(img_path):
prediction = ResNet50_predict_labels(img_path)
return ((prediction <= 268) & (prediction >= 151))

Similar to how I tested face_detector, I also tested this dog_detector using similar method:

# Test the performance of the dog_detector functionon the images in human_files_short and dog_files_short.human_face_count = 0
dog_as_face = 0
for i in human_files_short:
human_face_count += dog_detector(i)

for j in dog_files_short:
dog_as_face += dog_detector(j)

# print out accuracies for both types of images
print(f'{int(human_face_count/len(human_files_short)*100)}% of first 100 images in human_files_short have a detected human face')
print(f'{int(dog_as_face/len(dog_files_short)*100)}% of first 100 images in dog_files_short have a detected human face')

Result output:

0% of first 100 images in human_files_short have a detected human face
100% of first 100 images in dog_files_short have a detected human face

Well Done!

Step 3: Create a CNN to Classify Dog Breeds

Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images. In this step, I create a CNN that classifies dog breeds.

Step 3.1 First, preproces the data

from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True

# pre-process the data for Keras
train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

Step 3.2 Implement Model Architecture

from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential

IMAGE_SIZE = [224, 224]
model = Sequential([
# Stem
Conv2D(kernel_size=3, filters=16, padding='same', activation='relu', input_shape=[*IMAGE_SIZE, 3]),
Conv2D(kernel_size=3, filters=32, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),

# Conv Group
Conv2D(kernel_size=3, filters=64, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),
Conv2D(kernel_size=3, filters=96, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),

# Conv Group
Conv2D(kernel_size=3, filters=128, padding='same', activation='relu'),
MaxPooling2D(pool_size=2),
Conv2D(kernel_size=3, filters=128, padding='same', activation='relu'),

# 1x1 Reduction
Conv2D(kernel_size=1, filters=64, padding='same', activation='relu'),

# Classifier
GlobalAveragePooling2D(),
Dense(133, activation='softmax')
])

model.summary()

Output:

# Compile the model
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
from keras.callbacks import ModelCheckpoint

epochs = 20

checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5', verbose=1, save_best_only=True)

model.fit(train_tensors, train_targets,
validation_data=(valid_tensors, valid_targets),
epochs=epochs, batch_size=20, callbacks=[checkpointer],
verbose=1)

Step 3.3 After this step, training starts.

Step 3.4 After training done, I save the best-performing model and test its prediction accuracy.

# Load the Model with the Best Validation Loss
model.load_weights('saved_models/weights.best.from_scratch.hdf5')
# Test the Model# get index of predicted dog breed for each image in test set
dog_breed_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]

# report test accuracy
test_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(dog_breed_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

Step 3.5 Test Accuracy Output:

Test accuracy: 18.0622%

Step 4: Use a CNN to Classify Dog Breeds

In this step, the model uses the the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model.

Step 4.1 Load data

bottleneck_features = np.load('bottleneck_features/DogVGG16Data.npz')
train_VGG16 = bottleneck_features['train']
valid_VGG16 = bottleneck_features['valid']
test_VGG16 = bottleneck_features['test']

Step 4.2 Define Model Architecture

I only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.

VGG16_model = Sequential()
VGG16_model.add(GlobalAveragePooling2D(input_shape=train_VGG16.shape[1:]))
VGG16_model.add(Dense(133, activation='softmax'))

VGG16_model.summary()

Model Architecture Output:

Step 4.3 Compile the Model

VGG16_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Step 4.4 Train the Model and Save to saved_model dir (only the best model)

from keras.callbacks import ModelCheckpoint  
checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.VGG16.hdf5',
verbose=1, save_best_only=True)

VGG16_model.fit(train_VGG16, train_targets,
validation_data=(valid_VGG16, valid_targets),
epochs=20, batch_size=20, callbacks=[checkpointer], verbose=1)

Model Training Process

Step 4.5 Similar to what I did in Step 3, I also saved the best model to the dir and then test the model prediction accuracy on the same dog data images.

VGG16_model.load_weights('saved_models/weights.best.VGG16.hdf5')# get index of predicted dog breed for each image in test set
VGG16_predictions = [np.argmax(VGG16_model.predict(np.expand_dims(feature, axis=0))) for feature in test_VGG16]

# report test accuracy
test_accuracy = 100*np.sum(np.array(VGG16_predictions)==np.argmax(test_targets, axis=1))/len(VGG16_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

Step 4.6 Test Data Prediction Accuracy

Test accuracy: 40.5502%

Finally, I define a function that passes a image path as the input and then returns the prediction result using this model

from extract_bottleneck_features import *# Predict Dog Breed with the Modeldef VGG16_predict_breed(img_path):
# extract bottleneck features
bottleneck_feature = extract_VGG16(path_to_tensor(img_path))
# obtain predicted vector
predicted_vector = VGG16_model.predict(bottleneck_feature)
# return dog breed that is predicted by the model
return dog_names[np.argmax(predicted_vector)]

Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)

In this step, I used a pre-trained Xception Bottleneck features to build a CNN.

Step 5.1 Load Data

bottleneck_features = np.load('../../../data/bottleneck_features/DogXceptionData.npz')
train_Xception = bottleneck_features['train']
valid_Xception = bottleneck_features['valid']
test_Xception = bottleneck_features['test']

Step 5.2 Define Model Architecture (same as VGG16)

# Define model architecture Xception_model = Sequential() Xception_model.add(GlobalAveragePooling2D(input_shape=train_Xception.shape[1:])) Xception_model.add(Dense(133, activation='softmax')) Xception_model.summary()

Model Architecture Output:

Step 5.3 Compile the Model

Xception_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Step 5.4 Train the Model

checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.Xception.hdf5', 
verbose=1, save_best_only=True)

Xception_model.fit(train_Xception, train_targets,
validation_data=(valid_Xception, valid_targets),
epochs=20, batch_size=20, callbacks=[checkpointer], verbose=1)

Step 5.5 Load the model with the best validation loss

Xception_model.load_weights('saved_models/weights.best.Xception.hdf5')

Step 5.6 Test the Model

# get index of predicted dog breed for each image in test set Xception_predictions = [np.argmax(Xception_model.predict(np.expand_dims(feature, axis=0))) for feature in test_Xception] 

# report test accuracy
test_accuracy =100*np.sum(np.array(Xception_predictions)==np.argmax(test_targets, axis=1))/len(Xception_predictions) print('Test accuracy: %.4f%%' % test_accuracy)

Test Data Accuracy

Test accuracy: 84.3301%

Step 5.7 Predict Dog Breed with the Model

In this step, I combined functions I created at previous steps to create the final target function that takes an image path as input and returns the dog breed (Affenpinscher, Afghan_hound, etc) that is predicted by the model (CNN with pre-trained Xception in this example below).

from extract_bottleneck_features import *

def Xception_predict_breed(img_path):
# extract bottleneck features
bottleneck_feature = extract_Xception(path_to_tensor(img_path))
# obtain predicted vector
predicted_vector = Xception_model.predict(bottleneck_feature)
# return dog breed that is predicted by the model
return dog_names[np.argmax(predicted_vector)]

Step 6 Write Algorithm

This algorithm accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

  • if a dog is detected in the image, return the predicted breed.
  • if a human is detected in the image, return the resembling dog breed.
  • if neither is detected in the image, provide output that indicates an error.
def detector(img_path):
if dog_detector(img_path):
dog_human = 'dog'
breed = Xception_predict_breed(img_path)
elif face_detector(img_path):
dog_human = 'human'
breed = Xception_predict_breed(img_path)
else:
dog_human = 'Neither human or dog'
breed = f'Oops, there is something wrong. Try again.'
return dog_human, breed

Step 7 Test Algorithm

In this final testing, I used 2 humans images (a baby and a female Korean pop star), two dogs (a white pomeranian and a puppy samoyed), and a Persian cat. Here are the fucntion and prediction result.

for i in range(len(file_paths)):
print(f'Test {i+1}: {file_paths[i][12:]} picture is detected as {detector(file_paths[i])[0]}, its most similar breed is {detector(file_paths[i])[1]}')

Prediction output:

Test 1: Cat_Persian.jpg picture is detected as Neither human or dog, its most similar breed is Oops, there is something wrong. Try again.

Test 2: white_pomeranian.jpg picture is detected as dog, its most similar breed is ages/train/006.American_eskimo_dog

Test 3: puppy_samoyed.jpg picture is detected as dog, its most similar breed is ages/train/006.American_eskimo_dog

Test 4: korean pop star.jpg picture is detected as human, its most similar breed is ages/train/100.Lowchen

Test 5: baby.jpg picture is detected as human, its most similar breed is ages/train/056.Dachshund

Conclusion

As we can see the five test case above, this CNN classifier works well as it accurately identifies a cat as neither of a human or a dog. Same for the baby and Korean Pop Star, the CNN classifier identifies them as human and find it most similar dog breed. Although for the two dogs it does not predict its breed accurately, it at least successfully classifies them as dogs. The wrong breed prediction may be attributed to the lack of training data, lack of images for breeds. Thus, I will consider choose a more diverse range of images in order to further deep training the CNN classifier in order to improve its performance in predicting dog breed categories.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: