Evaluating the influence of backbone network architectures for object detection in  aerial images

Khang Nguyen

doi:10.15625/2525-2518/17595

Author affiliations

Authors

Khang Nguyen University of Information Technology, 1 Han Thuyen Street, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Viet Nam;
Vietnam National University, km 20 Ha Noi Highway, Linh Trung Ward, Thu Duc City, Ho Chi Minh City, Viet Nam

DOI:

https://doi.org/10.15625/2525-2518/17595

Keywords:

aerial images, object detection, vehicle detection, convolutional neural networks

Abstract

Drones are increasingly being used in surveillance, agriculture, and delivery tasks. However, the real-life application of images collected from drones in urban manage- ment in Vietnam is still limited. Although drone images have many advantages thanks to the flexibility of the latest devices, there are still new challenges, such as top-down views, small objects, arbitrary directions, and class imbalance. In this paper, we conduct research, survey, and evaluate the performance of CNN-based network architectures on object detection in aerial images. Experiments were conducted on seven deep learning network architectures: VGG, ResNet, ResNext, Res2Net, ResNeSt, HRNet, and RegNet to bring objective judgments and conclusions based on experiments, contributing to the development of solutions for applications of determining the status of urban traffic in Vietnam.

Downloads

References

Q. M. Chung, T. D. Le, T. V. Dang, N. D. Vo, T. V. Nguyen and K. Nguyen, - Data Augmentation Analysis in Vehicle Detection from Aerial Videos, Proceedings of International Conference on Computing and Communication Technologies, Ho Chi Minh City, Vietnam, 2020, pp. 1-3.

Vinh Long Phan, Nguyen D. Vo, and Khang Nguyen, - Detecting objects in images that are limited in visibility by fog. The 23nd National Conference on Electronics, Communications and Information Technology, Ho Chi Minh City, Vietnam, 2020, pp 44-49.

Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Haibin Ling, Qinghua Hu, Qinqin Nie, Hao Cheng, Chenfeng Liu, Xiaoyu Liu et al, - VisDroneDET2018: The Vision Meets Drone Object Detection in Image Challenge Results, Proceedings of European Conference on Computer Vision, Munich, Germany, 2018, pp 437–468.

Hongyang Yu, Guorong Li, Weigang Zhang, Qingming Huang, Dawei Du, Qi Tian & Nicu Sebe. - The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline. Int J Comput Vis 128 (2020), 1141–1159. https://doi.org/10.1007/s11263-019-01266-1.

Yang J, Xie X, Shi G, Yang W. - A Feature-Enhanced Anchor-Free Network for UAV Vehicle Detection. Remote Sensing 12(17) (2020) 2729-2809. https://doi.org/10.3390/rs12172729.

Ilker Bozcan and Erdal Kayacan. - Au-air: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance. In: IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020, pp. 8504–8510.

Karen Simonyan and Andrew Zisserman. - Very deep convolutional networks for largescale image recognition. Proceedings of International Conference on Learning Representations, San Diego, CA, USA, 2015, pp. 1–14.

K. He, X. Zhang, S. Ren and J. Sun, - Deep Residual Learning for Image Recognition, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.

S. Xie, R. Girshick, P. Dollár, Z. Tu and K. He, - Aggregated Residual Transformations for Deep Neural Networks, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii,USA, 2017, pp. 5987-5995.

Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. 2021. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, (2) (2021) 652–662. https://doi.org/10.1109/TPAMI.2019.2938758.

Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Lin, Haibin and Zhang, Zhi and Sun, Yue and He, Tong and Mueller, Jonas and Manmatha et al. - Resnest: Split-attention networks. Proceedings of the Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, USA, 2022, 2735-2745.

Wang, Jingdong and Sun, Ke and Cheng, Tianheng and Jiang, Borui and Deng, Chaorui and Zhao, Yang and Liu, Dong and Mu, Yadong and Tan, Mingkui and Wang, Xinggang et al., - Deep High-Resolution Representation Learning for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43 (10) (2021), 3349-3364, doi: 10.1109/TPAMI.2020.2983686.

J. Xu, Y. Pan, X. Pan, S. Hoi, Z. Yi and Z. Xu, - RegNet: Self-Regulated Network for Image Classification, in IEEE Transactions on Neural Networks and Learning Systems, vol 33, (3) (2022) 1-6, doi: 10.1109/TNNLS.2022.3158966.

Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Mądry, - How does batch normalization help optimization?, Proceedings of the 32nd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2018, 2488–2498.

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun - Faster R-CNN: towards real-time object detection with region proposal networks, Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp 91–99.

R. Girshick, - Fast R-CNN, Proceedings of IEEE International Conference on Computer Vision (ICCV), Las Condes, Chile, 2015, pp. 1440-1448.

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn and Andrew Zisserman, - The PASCAL Visual Object Classes (VOC) Challenge, Int J Comput Vis 88, (2010), 303–338, https://doi.org/10.1007/s11263-009-0275-4.

Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn & Andrew Zisserman, - The PASCAL Visual Object Classes Challenge: A Retrospective. Int J Comput Vis 111, (2015), 98–136 https://doi.org/10.1007/s11263-014-0733-5.

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg & Li Fei-Fei, - ImageNet Large Scale Visual Recognition Challenge, Int J Comput Vis 115, (2015), 211–252, https://doi.org/10.1007/s11263-015-0816-y.

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár & C. Lawrence Zitnick, - Microsoft COCO: Common Objects in Context, European Conference on Computer Vision, Zurich, Switzerland, 2014, pp 740–755.

Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui et al , -MMDetection: Open mmlab detection toolbox and benchmark, In:arXiv preprint arXiv:1906.07155, 2019, pp 1-13.