Applied and Computational Engineering
- The Open Access Proceedings Series for Conferences
Series Vol. 2 , 22 March 2023
* Author to whom correspondence should be addressed.
With the advent of intelligent era, the application of artificial intelligence has penetrated all walks of life in our daily lives and has gradually entered the scope of being understood by the public, such as AlphaGo, which is a human-machine game in Go. Meanwhile, with the increasing complexity of deep neural network algorithms, how to compress and optimize neural network models becomes the key to reducing model storage space and improving model deployment efficiency. This article will complete the INT8 quantization of the model for AlexNet and use the concept of batch normalization (BN) layer to prune the Yolov3 model. Finally, it is proved by experiments that the model quantization makes the image recognition time of the AlexNet about 1/3-1/4 of the original format; and pruning of Yolov3 will significantly reduce the storage space occupied by the model, moreover, make the speed doubled of embedded GPU real-time object detection. However, none of the three experiments reduces the model recognition accuracy. This paper proposes some essential concepts in the main body and applies them to model algorithms and experiments to provide theoretical support and proof for the field of deep learning model compression and optimization. Furthermore, it is possible to explore cutting-edge research directions like Sparse Convolution Net based on the research methods and experimental conclusions of the paper.
Deep Learning, Neural Network Quantization, Neural Network Pruning
1. Dixit M, Tiwari A, Pathak H, et al. An overview of deep learning architectures, libraries and its applications areas[C]//2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). IEEE, 2018: 293-297.
2. Zhang Z. Model Pruning Techniques for Boosting the Inference Efficiency on Embedded Systems[C]//2021 2nd International Conference on Computing and Data Science (CDS). IEEE, 2021: 119-124.
3. Liu J, Tripathi S, Kurup U, et al. Pruning algorithms to accelerate convolutional neural networks for edge applications: A survey[J]. arXiv preprint arXiv:2005.04275, 2020.
4. Mousa-Pasandi M, Hajabdollahi M, Karimi N, et al. Convolutional Neural Network Pruning Using Filter Attenuation[C]//2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020: 2905-2909.
5. Zhu F, Gong R, Yu F, et al. Towards unified int8 training for convolutional neural network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1969-1979.
6. Gheorghe Ș, Ivanovici M. Model-based Weight Quantization for Convolutional Neural Network Compression[C]//2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2021: 1-4.
7. Liang T, Glossner J, Wang L, et al. Pruning and quantization for deep neural network acceleration: A survey[J]. Neurocomputing, 2021, 461: 370-403.
8. Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[J]. arXiv preprint arXiv:1510.00149, 2015.
9. Bjorck N, Gomes C P, Selman B, et al. Understanding batch normalization[J]. Advances in neural information processing systems, 2018, 31.
10. Feng L, Wang H, Jin B, et al. Learning a distance metric by balancing kl-divergence for imbalanced datasets[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 49(12): 2384-2395.
11. Changpinyo S, Sandler M, Zhmoginov A. The power of sparsity in convolutional neural networks[J]. arXiv preprint arXiv:1702.06257, 2017
12. Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2736-2744.
13. Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015, 2(7).
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open Access Instruction).