Evaluating the influence of backbone network architectures for object detection in aerial images
Keywords:aerial images, object detection, vehicle detection, convolutional neural networks
Drones are increasingly being used in surveillance, agriculture, and delivery tasks. However, the real-life application of images collected from drones in urban manage- ment in Vietnam is still limited. Although drone images have many advantages thanks to the flexibility of the latest devices, there are still new challenges, such as top-down views, small objects, arbitrary directions, and class imbalance. In this paper, we conduct research, survey, and evaluate the performance of CNN-based network architectures on object detection in aerial images. Experiments were conducted on seven deep learning network architectures: VGG, ResNet, ResNext, Res2Net, ResNeSt, HRNet, and RegNet to bring objective judgments and conclusions based on experiments, contributing to the development of solutions for applications of determining the status of urban traffic in Vietnam.
Q. M. Chung, T. D. Le, T. V. Dang, N. D. Vo, T. V. Nguyen and K. Nguyen, - Data Augmentation Analysis in Vehicle Detection from Aerial Videos, Proceedings of International Conference on Computing and Communication Technologies, Ho Chi Minh City, Vietnam, 2020, pp. 1-3.
Vinh Long Phan, Nguyen D. Vo, and Khang Nguyen, - Detecting objects in images that are limited in visibility by fog. The 23nd National Conference on Electronics, Communications and Information Technology, Ho Chi Minh City, Vietnam, 2020, pp 44-49.
Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Haibin Ling, Qinghua Hu, Qinqin Nie, Hao Cheng, Chenfeng Liu, Xiaoyu Liu et al, - VisDroneDET2018: The Vision Meets Drone Object Detection in Image Challenge Results, Proceedings of European Conference on Computer Vision, Munich, Germany, 2018, pp 437–468.
Hongyang Yu, Guorong Li, Weigang Zhang, Qingming Huang, Dawei Du, Qi Tian & Nicu Sebe. - The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline. Int J Comput Vis 128 (2020), 1141–1159. https://doi.org/10.1007/s11263-019-01266-1. https://doi.org/10.1007/s11263-019-01266-1.">
Yang J, Xie X, Shi G, Yang W. - A Feature-Enhanced Anchor-Free Network for UAV Vehicle Detection. Remote Sensing 12(17) (2020) 2729-2809. https://doi.org/10.3390/rs12172729. https://doi.org/10.3390/rs12172729.">
Ilker Bozcan and Erdal Kayacan. - Au-air: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance. In: IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020, pp. 8504–8510.
Karen Simonyan and Andrew Zisserman. - Very deep convolutional networks for largescale image recognition. Proceedings of International Conference on Learning Representations, San Diego, CA, USA, 2015, pp. 1–14.
S. Xie, R. Girshick, P. Dollár, Z. Tu and K. He, - Aggregated Residual Transformations for Deep Neural Networks, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii,USA, 2017, pp. 5987-5995.
Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. 2021. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, (2) (2021) 652–662. https://doi.org/10.1109/TPAMI.2019.2938758. https://doi.org/10.1109/TPAMI.2019.2938758.">
Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Lin, Haibin and Zhang, Zhi and Sun, Yue and He, Tong and Mueller, Jonas and Manmatha et al. - Resnest: Split-attention networks. Proceedings of the Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, USA, 2022, 2735-2745.
Wang, Jingdong and Sun, Ke and Cheng, Tianheng and Jiang, Borui and Deng, Chaorui and Zhao, Yang and Liu, Dong and Mu, Yadong and Tan, Mingkui and Wang, Xinggang et al., - Deep High-Resolution Representation Learning for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43 (10) (2021), 3349-3364, doi: 10.1109/TPAMI.2020.2983686.
J. Xu, Y. Pan, X. Pan, S. Hoi, Z. Yi and Z. Xu, - RegNet: Self-Regulated Network for Image Classification, in IEEE Transactions on Neural Networks and Learning Systems, vol 33, (3) (2022) 1-6, doi: 10.1109/TNNLS.2022.3158966.
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Mądry, - How does batch normalization help optimization?, Proceedings of the 32nd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2018, 2488–2498.
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun - Faster R-CNN: towards real-time object detection with region proposal networks, Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp 91–99.
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn and Andrew Zisserman, - The PASCAL Visual Object Classes (VOC) Challenge, Int J Comput Vis 88, (2010), 303–338, https://doi.org/10.1007/s11263-009-0275-4. https://doi.org/10.1007/s11263-009-0275-4.">
Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn & Andrew Zisserman, - The PASCAL Visual Object Classes Challenge: A Retrospective. Int J Comput Vis 111, (2015), 98–136 https://doi.org/10.1007/s11263-014-0733-5. https://doi.org/10.1007/s11263-014-0733-5.">
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg & Li Fei-Fei, - ImageNet Large Scale Visual Recognition Challenge, Int J Comput Vis 115, (2015), 211–252, https://doi.org/10.1007/s11263-015-0816-y. https://doi.org/10.1007/s11263-015-0816-y.">
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár & C. Lawrence Zitnick, - Microsoft COCO: Common Objects in Context, European Conference on Computer Vision, Zurich, Switzerland, 2014, pp 740–755.
Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui et al , -MMDetection: Open mmlab detection toolbox and benchmark, In:arXiv preprint arXiv:1906.07155, 2019, pp 1-13.
How to Cite
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Vietnam Journal of Sciences and Technology (VJST) is an open access and peer-reviewed journal. All academic publications could be made free to read and downloaded for everyone. In addition, articles are published under term of the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA) Licence which permits use, distribution and reproduction in any medium, provided the original work is properly cited & ShareAlike terms followed.
Copyright on any research article published in VJST is retained by the respective author(s), without restrictions. Authors grant VAST Journals System a license to publish the article and identify itself as the original publisher. Upon author(s) by giving permission to VJST either via VJST journal portal or other channel to publish their research work in VJST agrees to all the terms and conditions of https://creativecommons.org/licenses/by-sa/4.0/ License and terms & condition set by VJST.
Authors have the responsibility of to secure all necessary copyright permissions for the use of 3rd-party materials in their manuscript.