Convolutional Neural Networks for Dog Breed Identification¶

Part 4¶

Juan E. Rolon, 2018.¶

Creating a CNN to Classify Dog Breeds using Transfer Learning¶

Here we use transfer learning to create a CNN that can identify dog breed from images. The implemented CNN attains approximately 83% accuracy on the test set.

In Step 4, we used transfer learning to create a CNN using VGG-16 bottleneck features. In this step, we use the bottleneck features from a different pre-trained model.

In addition, we use pre-computed features for all of the networks that are currently available in Keras:

VGG-19 bottleneck features
ResNet-50 bottleneck features
Inception bottleneck features
Xception bottleneck features

The files are encoded as such:

Dog{network}Data.npz

where {network}, in the above filename, can be one of VGG19, Resnet50, InceptionV3, or Xception. In the following, we pick one of the above architectures, downloading the corresponding bottleneck features, and storing the downloaded file in the bottleneck_features/ folder in the repository.

Obtaining Bottleneck Features¶

In the code block below, we extract the bottleneck features corresponding to the train, test, and validation sets by running the following:

bottleneck_features = np.load('bottleneck_features/Dog{network}Data.npz')
train_{network} = bottleneck_features['train']
valid_{network} = bottleneck_features['valid']
test_{network} = bottleneck_features['test']

bottleneck_features = np.load('bottleneck_features/DogResnet50Data.npz')
train_ResNet50 = bottleneck_features['train']
valid_ResNet50 = bottleneck_features['valid']
test_ResNet50 = bottleneck_features['test']

Model Architecture¶

Here we create a CNN to classify dog breed, and summarize the layers of the model by executing the line:

    <model's name>.summary()

Steps taken to build and implement the CNN architecture:¶

Image dataset augmentation.

As done before, I augmented the input datasets to help improving the algorithm's performance metrics.

Bottleneck features.

Since we are implementing transfer learning from a ResNet50 CNN that was extensively trained for visual recognition in a different dataset, we need to extract only the corresponding bottleneck features from it. This original network can't be used directly to classify dog breeds in our problem as it was not trained using our datasets.

However, we can imagine that the previous ResNet50 CNN is composed of two parts:

A subnetwork made up of several layers, starting from the input layer up to some intermediate layer $k$. This subnetwork implements a mapping from the input manifold to a lower $n_k$-dimensional vector space. Therefore, due to its reduced dimensionality the $k-th$ layer is a bottleneck layer. Usually, it is expected that the vector of activations nodes in this layer would provide the features (bottleneck features) necessary to represent dog faces in a generic manner in our problem. Finally, we may opt to keep only the bottleneck layer and feed it as input to the model of our choice.

The second part (or subnetwork) of the previous ResNet50, starting at the $k+1$ layer up to the output layer is discarded and replaced by the layers of our choice tailored specifically to the needs of our problem.

Additional new layers.

As we assume that the bottleneck features were carefully extracted from a related visual recognition problem, we would only add two layer to our new model:

Global average pooling.- It is recommended to reduce each of the feature maps in the bottleneck layer to single values, effectively performing a flattening operation.

Dense linear output layer.- The output of the average pooling is fed into a dense layer whose linear dimension agrees with the number of categories or classes in our problem. We use a typical softmax activation for the nodes in this layer.

Performance expectations.
The transfer learning model should work much better. As shown below it achieves a 83.13% accuracy. This is the probable result of using the knowledge (bottleneck features) gathered by the previous ResNet50 CNN trained extensively in a similar pattern recognition problem.

from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense, Activation
from keras.models import Sequential

ResNet_model = Sequential()
ResNet_model.add(GlobalAveragePooling2D(input_shape=train_ResNet50.shape[1:]))
ResNet_model.add(Dense(133, activation='softmax'))

ResNet_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
global_average_pooling2d_3 ( (None, 2048)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 133)               272517    
=================================================================
Total params: 272,517
Trainable params: 272,517
Non-trainable params: 0
_________________________________________________________________

Pre-training analysis:¶

show_image('val_curves/resnet50_val_curves.png')

The figure above shows plots of the ResNet50 pre-trained model accuracy and loss function v.s. number epochs for the training and validation sets respectively. The run was implemented in a separate script using a total of 100 epochs and a batch size of 20 samples. The validation loss decays in-between 1 and 20 epochs with significant gains in accuracy. Beyond this range the model shows signs of overfitting, as the validation reverts to an increasing behavior as we add more epochs. The validation accuracy saturates very quickly around 20 epochs. This suggests that we need about 20 epochs to attain good accuracy and avoid overfitting.

Compile the Model¶

from keras.optimizers import Adamax

ResNet_model.compile(loss='categorical_crossentropy', optimizer=Adamax(lr=0.002), metrics=['accuracy'])

Train the Model¶

Here we train the model in the code cell below. We use model checkpointing to save the model that attains the best validation loss.

We can augment the training data, but this is not a requirement.

#Fit or 'train' model
from keras.callbacks import ModelCheckpoint

checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.ResNet50.hdf5',
                               verbose=1, save_best_only=True)

#Specify number of epochs and batch_size
epochs = 20
batch_size = 20


init_train_time_rsnt50 = time.time()

# Set to True to fit the model to the non-augmented datasets
if False:
    h_rsnt50 = ResNet_model.fit(train_ResNet50, train_targets,
          validation_data=(valid_ResNet50, valid_targets),
          epochs=epochs, batch_size=batch_size, callbacks=[checkpointer], verbose=1)

# Set to True to fit the model to the augmented datasets
if True:
    h_rsnt50 = ResNet_model.fit_generator(datagen_train.flow(train_ResNet50, train_targets, batch_size=batch_size),
                        steps_per_epoch=train_ResNet50.shape[0] // batch_size,
                        epochs=epochs, verbose=1, callbacks=[checkpointer],
                        validation_data=datagen_valid.flow(valid_ResNet50, valid_targets, batch_size=batch_size),
                        validation_steps=valid_ResNet50.shape[0] // batch_size)

end_train_time_rsnt50 = time.time()
tot_train_time_rsnt50 = end_train_time_rsnt50 - init_train_time_rsnt50

/home/rolon/anaconda2/envs/tflow_gpu_opt/lib/python3.5/site-packages/keras/preprocessing/image.py:787: UserWarning: NumpyArrayIterator is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3 or 4 channels on axis 3. However, it was passed an array with shape (6680, 1, 1, 2048) (2048 channels).
  ' (' + str(self.x.shape[channels_axis]) + ' channels).')
/home/rolon/anaconda2/envs/tflow_gpu_opt/lib/python3.5/site-packages/keras/preprocessing/image.py:787: UserWarning: NumpyArrayIterator is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3 or 4 channels on axis 3. However, it was passed an array with shape (835, 1, 1, 2048) (2048 channels).
  ' (' + str(self.x.shape[channels_axis]) + ' channels).')

Epoch 1/20
333/334 [============================>.] - ETA: 0s - loss: 0.6720 - acc: 0.8473Epoch 00000: val_loss improved from inf to 0.81634, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 291s - loss: 0.6712 - acc: 0.8476 - val_loss: 0.8163 - val_acc: 0.7744
Epoch 2/20
333/334 [============================>.] - ETA: 0s - loss: 0.4248 - acc: 0.9113Epoch 00001: val_loss improved from 0.81634 to 0.68373, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 284s - loss: 0.4247 - acc: 0.9111 - val_loss: 0.6837 - val_acc: 0.8037
Epoch 3/20
333/334 [============================>.] - ETA: 0s - loss: 0.2963 - acc: 0.9462Epoch 00002: val_loss improved from 0.68373 to 0.62324, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 286s - loss: 0.2964 - acc: 0.9463 - val_loss: 0.6232 - val_acc: 0.8122
Epoch 4/20
333/334 [============================>.] - ETA: 0s - loss: 0.2148 - acc: 0.9691Epoch 00003: val_loss improved from 0.62324 to 0.59476, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 283s - loss: 0.2147 - acc: 0.9692 - val_loss: 0.5948 - val_acc: 0.8280
Epoch 5/20
333/334 [============================>.] - ETA: 0s - loss: 0.1564 - acc: 0.9811Epoch 00004: val_loss improved from 0.59476 to 0.58303, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 285s - loss: 0.1565 - acc: 0.9810 - val_loss: 0.5830 - val_acc: 0.8220
Epoch 6/20
333/334 [============================>.] - ETA: 0s - loss: 0.1184 - acc: 0.9890Epoch 00005: val_loss improved from 0.58303 to 0.55994, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 286s - loss: 0.1184 - acc: 0.9891 - val_loss: 0.5599 - val_acc: 0.8317
Epoch 7/20
333/334 [============================>.] - ETA: 0s - loss: 0.0898 - acc: 0.9944Epoch 00006: val_loss improved from 0.55994 to 0.55237, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 287s - loss: 0.0899 - acc: 0.9943 - val_loss: 0.5524 - val_acc: 0.8268
Epoch 8/20
333/334 [============================>.] - ETA: 0s - loss: 0.0705 - acc: 0.9961Epoch 00007: val_loss improved from 0.55237 to 0.52639, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 285s - loss: 0.0706 - acc: 0.9960 - val_loss: 0.5264 - val_acc: 0.8415
Epoch 9/20
333/334 [============================>.] - ETA: 0s - loss: 0.0546 - acc: 0.9973Epoch 00008: val_loss did not improve
334/334 [==============================] - 283s - loss: 0.0545 - acc: 0.9973 - val_loss: 0.5320 - val_acc: 0.8427
Epoch 10/20
333/334 [============================>.] - ETA: 0s - loss: 0.0433 - acc: 0.9980Epoch 00009: val_loss did not improve
334/334 [==============================] - 286s - loss: 0.0433 - acc: 0.9981 - val_loss: 0.5397 - val_acc: 0.8244
Epoch 11/20
333/334 [============================>.] - ETA: 0s - loss: 0.0348 - acc: 0.9983Epoch 00010: val_loss improved from 0.52639 to 0.52320, saving model to saved_models/weights.best.ResNet50.hdf5
334/334 [==============================] - 285s - loss: 0.0348 - acc: 0.9984 - val_loss: 0.5232 - val_acc: 0.8415
Epoch 12/20
333/334 [============================>.] - ETA: 0s - loss: 0.0276 - acc: 0.9986Epoch 00011: val_loss did not improve
334/334 [==============================] - 286s - loss: 0.0276 - acc: 0.9987 - val_loss: 0.5306 - val_acc: 0.8317
Epoch 13/20
333/334 [============================>.] - ETA: 0s - loss: 0.0230 - acc: 0.9986Epoch 00012: val_loss did not improve
334/334 [==============================] - 287s - loss: 0.0230 - acc: 0.9987 - val_loss: 0.5315 - val_acc: 0.8463
Epoch 14/20
333/334 [============================>.] - ETA: 0s - loss: 0.0186 - acc: 0.9986Epoch 00013: val_loss did not improve
334/334 [==============================] - 286s - loss: 0.0186 - acc: 0.9987 - val_loss: 0.5525 - val_acc: 0.8366
Epoch 15/20
333/334 [============================>.] - ETA: 0s - loss: 0.0153 - acc: 0.9991Epoch 00014: val_loss did not improve
334/334 [==============================] - 286s - loss: 0.0152 - acc: 0.9991 - val_loss: 0.5394 - val_acc: 0.8390
Epoch 16/20
333/334 [============================>.] - ETA: 0s - loss: 0.0132 - acc: 0.9985Epoch 00015: val_loss did not improve
334/334 [==============================] - 290s - loss: 0.0132 - acc: 0.9985 - val_loss: 0.5481 - val_acc: 0.8439
Epoch 17/20
333/334 [============================>.] - ETA: 0s - loss: 0.0112 - acc: 0.9985Epoch 00016: val_loss did not improve
334/334 [==============================] - 286s - loss: 0.0112 - acc: 0.9985 - val_loss: 0.5396 - val_acc: 0.8402
Epoch 18/20
333/334 [============================>.] - ETA: 0s - loss: 0.0095 - acc: 0.9986Epoch 00017: val_loss did not improve
334/334 [==============================] - 285s - loss: 0.0095 - acc: 0.9987 - val_loss: 0.5468 - val_acc: 0.8476
Epoch 19/20
333/334 [============================>.] - ETA: 0s - loss: 0.0084 - acc: 0.9982Epoch 00018: val_loss did not improve
334/334 [==============================] - 282s - loss: 0.0084 - acc: 0.9982 - val_loss: 0.5543 - val_acc: 0.8451
Epoch 20/20
333/334 [============================>.] - ETA: 0s - loss: 0.0073 - acc: 0.9986Epoch 00019: val_loss did not improve
334/334 [==============================] - 289s - loss: 0.0073 - acc: 0.9987 - val_loss: 0.5607 - val_acc: 0.8463

print("Training time = {0:.3f} minutes ".format(round(tot_train_time_rsnt50/60.0, 3)))

Training time = 95.461 minutes

plotSave_pf_metrics(h_rsnt50, 'rsnt50_cnn_pf_metrics')

Loading the Model with the Best Validation Loss¶

ResNet_model.load_weights('saved_models/weights.best.ResNet50.hdf5')

Testing the Model¶

Hee we evaluate the model on the test dataset of dog images and obtain an accuracy of 83.13%

ResNet50_predictions = [np.argmax(ResNet_model.predict(np.expand_dims(feature, axis=0))) for feature in test_ResNet50]
test_accuracy = 100*np.sum(np.array(ResNet50_predictions)==np.argmax(test_targets, axis=1))/len(ResNet50_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

Test accuracy: 83.1340%

Predict Dog Breed with the Model¶

In the following, we write a function that takes an image path as input and returns the dog breed (Affenpinscher, Afghan_hound, etc) that is predicted by the model.

As done in Step 5, the function should involves three steps:

Extracting the bottleneck features corresponding to the chosen CNN model.
Supplying the bottleneck features as input to the model to return the predicted vector. Here we note that the argmax of this prediction vector gives the index of the predicted dog breed.
Using the dog_names array defined in Step 0 of this notebook to return the corresponding breed.

The functions to extract the bottleneck features can be found in extract_bottleneck_features.py, and they have been imported in an earlier code cell. To obtain the bottleneck features corresponding to the chosen CNN architecture, we use the function

extract_{network}

where {network}, in the above filename, should be one of VGG19, Resnet50, InceptionV3, or Xception.

from extract_bottleneck_features import *

def Resnet50_predict_breed(img_path):
    # extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    # obtain predicted vector
    predicted_vector = ResNet_model.predict(bottleneck_feature)
    # return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

Classifier Algorithm¶

Here we write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

if a dog is detected in the image, return the predicted breed.
if a human is detected in the image, return the resembling dog breed.
if neither is detected in the image, provide output that indicates an error.

We use face_detector and dog_detector functions developed above. However, we use the CNN from Step 5 to predict dog breed.

Some sample output for the algorithm is provided below.

Sample Human Output