Convolutional Neural Networks for Dog Breed Identification¶

Part 3¶

Juan E. Rolon, 2018.¶

Using a CNN to Classify Dog Breeds¶

To reduce training time without sacrificing accuracy, we train a CNN using transfer learning.

Obtain Bottleneck Features¶

bottleneck_features = np.load('bottleneck_features/DogVGG16Data.npz')
train_VGG16 = bottleneck_features['train']
valid_VGG16 = bottleneck_features['valid']
test_VGG16 = bottleneck_features['test']

Model Architecture¶

The model uses the the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.

VGG16_model = Sequential()
VGG16_model.add(GlobalAveragePooling2D(input_shape=train_VGG16.shape[1:]))
VGG16_model.add(Dense(133, activation='softmax'))

VGG16_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
global_average_pooling2d_2 ( (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 133)               68229     
=================================================================
Total params: 68,229
Trainable params: 68,229
Non-trainable params: 0
_________________________________________________________________

Pre-training analysis:¶

show_image('val_curves/vgg16_val_curves.png')

The figure above shows plots of the VGG16 pre-trained model accuracy and loss function v.s. number epochs for the training and validation sets respectively. The run was implemented in a separate script using a total of 300 epochs and a batch size of 20 samples. The validation loss decays in-between 1 and 300 epochs with significant gains in accuracy. Within this range the model does not shows signs of overfitting, as the validation continues to decay with increasing number of epochs. However, it seems tha the validation accuracy saturates and increases at a very low rate. This suggests that we would need a very long run to check whether we can improve accuracy from this point on. Below we train the model for 300 epochs.

Compile the Model¶

VGG16_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Train the Model¶

#@Juan E. Rolon
#Fit or 'train' model
from keras.callbacks import ModelCheckpoint

checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.VGG16.hdf5',
                               verbose=1, save_best_only=True)

#specify number of epochs and batch_size
epochs = 300
batch_size = 20

init_train_time_vgg16 = time.time()

# Set to True to fit the model to the non-augmented datasets
if False:
    h_vgg16 = VGG16_model.fit(train_VGG16, train_targets,
                    validation_data=(valid_VGG16, valid_targets),
                    epochs=epochs, batch_size=batch_size, callbacks=[checkpointer], verbose=1)

# Set to True to fit the model to the augmented datasets
if True:
    h_vgg16 = VGG16_model.fit_generator(datagen_train.flow(train_VGG16, train_targets, batch_size=batch_size),
                        steps_per_epoch=train_VGG16.shape[0] // batch_size,
                        epochs=epochs, verbose=1, callbacks=[checkpointer],
                        validation_data=datagen_valid.flow(valid_VGG16, valid_targets, batch_size=batch_size),
                        validation_steps=valid_VGG16.shape[0] // batch_size)

end_train_time_vgg16 = time.time()
tot_train_time_vgg16 = end_train_time_vgg16 - init_train_time_vgg16

/home/rolon/anaconda2/envs/tflow_gpu_opt/lib/python3.5/site-packages/keras/preprocessing/image.py:787: UserWarning: NumpyArrayIterator is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3 or 4 channels on axis 3. However, it was passed an array with shape (6680, 7, 7, 512) (512 channels).
  ' (' + str(self.x.shape[channels_axis]) + ' channels).')
/home/rolon/anaconda2/envs/tflow_gpu_opt/lib/python3.5/site-packages/keras/preprocessing/image.py:787: UserWarning: NumpyArrayIterator is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3 or 4 channels on axis 3. However, it was passed an array with shape (835, 7, 7, 512) (512 channels).
  ' (' + str(self.x.shape[channels_axis]) + ' channels).')

print("Training time = {0:.3f} minutes ".format(round(tot_train_time_vgg16/60.0, 3)))

Training time = 378.446 minutes

plotSave_pf_metrics(h_vgg16, 'vgg16_cnn_pf_metrics')

Loading the Model with the Best Validation Loss¶

VGG16_model.load_weights('saved_models/weights.best.VGG16.hdf5')

Testing the Model¶

Now, we can use the CNN to test how well it identifies breed within our test dataset of dog images. We print the test accuracy below.

# get index of predicted dog breed for each image in test set
VGG16_predictions = [np.argmax(VGG16_model.predict(np.expand_dims(feature, axis=0))) for feature in test_VGG16]

# report test accuracy
test_accuracy = 100*np.sum(np.array(VGG16_predictions)==np.argmax(test_targets, axis=1))/len(VGG16_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

Test accuracy: 60.8852%

Predicting Dog Breed¶

from extract_bottleneck_features import *

def VGG16_predict_breed(img_path):
    # extract bottleneck features
    bottleneck_feature = extract_VGG16(path_to_tensor(img_path))
    # obtain predicted vector
    predicted_vector = VGG16_model.predict(bottleneck_feature)
    # return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

Image classifier:¶

#@Juan E. Rolon
#Added this function to save performance metrics to csv file and generate corresponding
#plots. It receives the checkpointe history object and a desired filename without 
#file extensions. It outputs plots and stores as .csv and .png file respectively.
# determines and returns whether the image contains a human, dog, or neither

def classify_image(predictor, img_path):
    img = cv2.imread(img_path)
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    imgplot = plt.imshow(cv_rgb)

    if face_detector(img_path):
        pred = predictor(img_path)
        breed = pred.rpartition('/')[-1].rpartition('.')[-1].replace('_', ' ')
        print("Human face detected in image")
        print("This human looks like a {}".format(breed))
        return
    elif dog_detector(img_path):
        pred = predictor(img_path)
        breed = pred.rpartition('/')[-1].rpartition('.')[-1].replace('_', ' ')
        print("Dog face detected in image")
        print("This dog looks like a {}".format(breed))
    else:
        print("Non identifiable dog nor human faces detected")

classify_image(VGG16_predict_breed, 'my_test_images/German_Shepherd.jpg')

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58761216/58889256 [============================>.] - ETA: 0sDog face detected in image
This dog looks like a Belgian malinois

classify_image(VGG16_predict_breed, 'my_test_images/Chihuahua.jpg')

Dog face detected in image
This dog looks like a Chihuahua

classify_image(VGG16_predict_breed,'my_test_images/Cocker_spaniel.jpg')

Dog face detected in image
This dog looks like a Poodle

classify_image(VGG16_predict_breed,'my_test_images/Greyhound.jpg')

Dog face detected in image
This dog looks like a Greyhound

Description of the VGG16 classification results:¶

As shown abve, the VGG16 image classifier was able to detect correctly two dog breeds out of 4 given images. The result is consistent with the computed accuracy of ~60%. It will be possible to attain higher accuracy by further fine-training the model and by letting it run by a larger number of epochs (>300 epochs) before reaching the overfitting regime.