ResNet50 Architecture Based Dog Breed Identification Using Deep Learning

. Dog breed identification is one of the challenging task in animal husbandry. Knowing what a dog was bred for can assist you in predicting and comprehending its behavior. Most breeds were created with a specific goal in mind: to have the desire to achieve something specific. Dogs retain part of that drive as pets in our homes, and without a good outlet for that energy, it might manifest in undesirable ways. Developing a deep learning model, that can identify the breed of the dog with high accuracy, using this model one can easily identify the dog breed which makes easier for them to train the dog, so that we can’t wait for the experts and no need for spending money for identifying the dog breed, using the mobile phones we can easily identify our dog breed simply using its image. This paper deals with an advanced classification model was designed which identifies the breed of the dog. A training dataset consisting of more than 10000 images is used, CNN is used for the classification and the accuracy is achieved using Resnet50 architecture which is highest when compared to other models.


Introduction
In recent times, convolutional neural networks (CNN) are most popular and are being widely used in the field of deep learning: image recognition [1], detection, speech recognition, data generation, etc. an important property of CNN is Scale Invariance with Feature Transform (SIFT), i.e., a CNN is capable of learning feature tensors from the examples.CNNs have gained significant popularity owing to the viability of general recurring structures for a wide range of problems.Using the combined dataset, the current research describes methods and results of hyper parameter tuning CNNs for ResNet topologies.This problem is a classification task as well as a problem of finegrained image recognition task, in which only minor differences separate two groups.CNNs are similar to Artificial Neural Networks [2], interms of learnable weights and biases, but contain deep layers of alternating convolutional and pooling layers.Any architecture of CNN has a few dense layers at the end.The filters, which are the learnable feature matrices are effective in image recognition and classification problems.Deep CNNs can handle enormous datasets and are even accurate in video classification on a wide scale.The Resnet50 architec-tures finetuning procedures and learning results are discussed.This architecture can identify a dog's breed from a photograph provided by the user.

Literature Review
At present, approximately the population of dogs worldwide is 1 billion of which 75 % to 85% of them account for stray dogs and wild dogs [3].Identifying the breed of dogs automatically has been discussed in [4] where two image processing approaches to classify the dog breeds are presented.One of the approach uses the conventional Local Binary Pattern (LBP) and Histogram of Oriented Gradient (HOG) algorithms.The other approach employs deep learning using convolutional neural networks (CNN) with transfer learning.It is found that CNN model performs better than the conventional method.
In [5], The author has proposed Principal Component Analysis (PCA) to achieve dimensionality reduction and feature selection.After PCA, a classifier is used to classify dog breed based on the dog face images where the template matching technique is applied.
In [6], the author proposed a novel approach to finegrained image classification for dog breed identification and shown that extracting corresponding parts improves classification performance.The author has achieved 67% recognition rate on a large real-world dataset including 133 dog breeds and 8,351 images.The author used the datasets are referenced in [7], [8].
Machine learning based approach [9] has presented by the author to identify images of invasive hydrangea.And also the author has practiced a deep learning technique that extensively applied to image recognition was used.
The work which is carried out by the author in [10], convolutional neural network framework has used to train and categorize dog breeds.And also the author has approached this CNN algorithm based on LeNet [11] and GoogLeNet [12] architectures.

System Architecture
This paper is about identifying Dog Breed Classification using ResNet approach, the moles from the images is easily extracted by the convolutional layers.The model can easily identify whether which type of breed is dog.Here we classify 120 classes of Dog breed.After the model trained the comparison of Dog breed image and trained model takes place to predict the Breed, the proposed system is shown in Fig 1.

CNN Architecture
CNN is a kind of deep neural network which employes filters to learn image features.CNN when compared to deep neural networks have lesser parameters to learn and mimic the processing of visual cortext in the huma brain.Because the novel architecture, CNNs have achieved near huma level accuracy on image processing task.The basic structure of CNN [13] includes alternating layers of convolutional and pooling as shown in Fig 2 .The pooling layers help reduce the number of parameters to be learned from samples.The first layer of any CNN is the convolution layer.The convolution layer extracts features from an input image using its filters [14].The filters learn the intrinsic features present in the images by computing the convolution of the filter with the subset of the image [15].Each convolutional layer produces a feature map.A feature map represents the local weighted sums by convoluting a portion of the image with the filter.A different kinds of activation functions are employed in the hidden layers.Among them, Rectified Linear Unit (ReLu) activation function is prepfered as it increase the nonlinearity [16].
At the end stage of CNN, the output of the convolution layers are flattened by means of 1 × 1 convolution layer and fed into dense layers.Depending on the task differnet output layers are used.For example, in classsification task sigmoid which is also called the logistics function is used [17].For categorical clasification, softmax layers are used.

ResNet50 Architecture
ResNet50's architecture [18] is divided into four stages, as depicted in Fig 3 .Generally, ResNet50 design uses a 7×7 and a 3×3 kernel (filter) sizes for initial convolution and max pooling, respectively.The ResNet architecture has 4 stages of concolutional and pooling layers.The identity relationship is addressed by the bended bolts.The ran interfacing bolt shows that the convolution activity in the remaining Square is directed with step 2, bringing about a half size input as far as stature and expansiveness however a multiplying channel width.It is multiplied and the size of the information is diminished to half as we advance starting with one phase then onto the next.
Bottleneck architecture [19] is utilised for deeper networks like ResNet50, ResNet152, and so on.Three layers are placed one on top of the other for each residual function F. Convolutions are used in the first, third, and eleventh layers.The 11 convolution layers are in charge of shrinking and then expanding the dimensions.With lower input/output dimensions, the 33 layer is left as a bottleneck [20].Finally, the network has an Average Pooling layer, which is followed by a 1000 neuron fully linked layer, as shown in the Fig 3.

Experimental Environment
In this work, Python a high level general purpose programming language that is interpreted, interactive, object oriented, and general purpose is used.Python is open source under the GNU General Public License (GPL) [21].Python has a dynamic type system and memory management that is automated.Since machine learning and deep learning tasks are resource hungry, it takes huge time to train the models.Therefore, google has come up with the GoogleColab facility on its cloud.Researchers can utilize this facility to train and test their models It has a similar user interface to the anaconda jupyter notebook.To speed up the execution time, the execution might be done in RAM or on the GPU.

Procedure
Our image detection model was done in the platform named google Colab with the help of RESNET model.We have imported all the images to the google Colab using drive [23][24].We defined ResNet model then starts preprocessing of each images [25].After preprocessing, the training and validation is carried out as selected in the training regime [26].

Loss and Accuracy
During the training process several iterations are carried out in which the loss and accuracy is calculated [27].With each iteration a loss tends to decrease and accuracy increases [28].If training accuracy is high and validation accuracy is low, the system is said to be having high variance.On the other hand, if training accuracy itself is less then, the system is said to be biase..On the training and test datasets, the trained models are tested every 10 minutes.On the training and test datasets, the accuracy is tracked; this statistic shows the mean percentage of correctly categorized classes on a dataset.The Resnet50 models gives the accuracy of 85.15% for the Stanford dataset, which is comparatively higher than the other models.During execution, the accuracy and loss are recorded for  every epoch.It is found that the accuracy starts with 7% and increases to 93% gradually at the 40th epoch.

Comparison with Other Models
The Resnet50 model gives the better accuracy when compared with other architectures like Alexnet, Googlenet, Nasnet-A and Resnet18.The maximum accuracy achieved by the former is 80% by Nasnet-A, which is also considerably low when compared to Resnet50 which is 93%.The performance of each model is shown in the   the Breed.In addition, future effort will include training the model with diverse architecture with better accuracy.

Fig 5
Fig 5 represents the accuracy comparison for each epochs, here the accuracy increases for each epoch.

Fig 6
Fig 6 represents the loss comparison for each epochs, here the loss decreases for each epoch.Depending on the training / validation / testing regime, training accuracy and validation accuracy are compared.If training accuracy is high and validation accuracy is low, the system is said to be having high variance.On the other hand, if training accuracy itself is less then, the system is said to be biase..On the training and test datasets, the trained models are tested every 10 minutes.On the training and test datasets, the accuracy is tracked; this statistic shows the mean percentage of correctly categorized classes on a dataset.The Resnet50 models gives the accuracy of 85.15% for the Stanford dataset, which is comparatively higher than the other models.During execution, the accuracy and loss are recorded for
of labelled images is needed to build CNN.To solve this main difficulty, deep learning and image augmentation are applied to a pre-trained ResNet50.It has an accuracy of 85 percent throughout training.Normal deep neural network creates vanishing gradient problem.To overcome this, Res-net50 uses the skip connection.The training was done with the help of RELU and softmax activation.After the model trained the comparison of the new test image and trained model takes place to predict

Table 1 .
The performance Graph in the Fig 7 shows how clearly Resnet50 out performs other models.