TAEKWONDO POSE ESTIMATION WITH DEEP LEARNING ARCHITECTURES ON ONE-DIMENSIONAL AND TWO-DIMENSIONAL DATA
Keywords:Pose classification, Skeleton, Sports lessons, Taekwondo.
Practicing sports is an activity that helps people maintain and improve their health, enhance memory and concentration, reduce anxiety and stress, and train teamwork and leadership ability. With the development of science and technology, artificial intelligence in sports has become increasingly popular with the public and brings many benefits. In particular, many applications help people track and evaluate athletes' achievements in competitions. This study extracts images from Taekwondo videos and generates skeleton data from frames using the Fast Forward Moving Picture Experts Group (FFMPEG) technique using MoveNet. After that, we use deep learning architectures such as Long Short-Term Memory Networks, Convolutional Long Short-Term Memory, and Long-term Recurrent Convolutional Networks to perform the poses classification tasks in Taegeuk in Jang lessons. This work presents two approaches. The first approach uses a sequence skeleton extracted from the image by Movenet. Second, we use sequence images to train using video classification architecture. Finally, we recognize poses in sports lessons using skeleton data to remove noise in the image, such as background and extraneous objects behind the exerciser. As a result, our proposed method has achieved promising performance in pose classification tasks in an introductory Taekwondo lesson.
S. Alghyaline, J.-W. Hsieh, and C.-H. Chuang, “Video action classification using symmelets and deep learning,” Oct. 2017. [Online]. Available: https://doi.org/10.1109/smc.2017.8122640 https://doi.org/10.1109/smc.2017.8122640">
J. Arunnehru, G. Chamundeeswari, and S. P. Bharathi, “Human action recognition using 3d convolutional neural networks with 3d motion cuboids in surveillance videos,” Procedia Computer Science, vol. 133, pp. 471–477, 2018. [Online]. Available: https:
G. Batchuluun, J. K. Kang, D. T. Nguyen, T. D. Pham, M. Arsalan, and K. R. Park, “Action recognition from thermal videos using joint and skeleton information,” IEEE Access, vol. 9, pp. 11 716–11 733, 2021. [Online]. Available: https://doi.org/10.1109%2Faccess.2021.3051375 https://doi.org/10.1109%2Faccess.2021.3051375">
M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. IEEE, 2005. [Online]. Available: https://doi.org/10.1109/iccv.2005.28 https://doi.org/10.1109/iccv.2005.28">
S. Chaudhary and S. Murala, “Deep network for human action recognition using weber motion,” Neurocomputing, vol. 367, pp. 207–216, Nov. 2019. [Online]. Available: https://doi.org/10.1016/j.neucom.2019.08.031 https://doi.org/10.1016/j.neucom.2019.08.031">
J. Chen, R. D. J. Samuel, and P. Poovendran, “LSTM with bio inspired algorithm for action recognition in sports videos,” Image and Vision Computing, vol. 112, p. 104214, Aug. 2021. [Online]. Available: https://doi.org/10.1016/j.imavis.2021.104214 https://doi.org/10.1016/j.imavis.2021.104214">
D. Tzovaras, L. Chen, and R. Hamzaoui, “Feature learning for human activity recognition using convolutional neural networks,” CCF Transactions on Pervasive Computing and Interaction, vol. 2, no. 1, pp. 18–32, Jan. 2020. [Online]. Available: https:
Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2015. [Online]. Available: https://doi.org/10.1109%2Fcvpr.2015.7298714 https://doi.org/10.1109%2Fcvpr.2015.7298714">
R. O. García, E. F. Morales, and L. E. Sucar, “Second-order motion descriptors for efficient action recognition,” Pattern Analysis and Applications, vol. 24, no. 2, pp. 473–482, Oct. 2020. [Online]. Available: https://doi.org/10.1007/s10044-020-00924-2 https://doi.org/10.1007/s10044-020-00924-2">
Y. Hbali, S. Hbali, L. Ballihi, and M. Sadgal, “Skeleton-based human activity recognition for elderly monitoring systems,” IET Computer Vision, vol. 12, no. 1, pp. 16–26, nov 2017. [Online]. Available: https://doi.org/10.1049%2Fiet-cvi.2017.0062 https://doi.org/10.1049%2Fiet-cvi.2017.0062">
H. T. T. Hoang, C. N. Ha, D. T. Nguyen, T. N. Nguyen, T. N. Huynh, T. T. Phan, and H. T. Nguyen, “Poses classification in a taekwondo lesson using skeleton data extracted from videos with shallow and deep learning architectures,” pp. 447–461, 2022. [Online].
E. P. Ijjina and K. M. Chalavadi, “Human action recognition in RGB-d videos using motion sequence information and deep learning,” Pattern Recognition, vol. 72, pp. 504–516, Dec. 2017. [Online]. Available: https://doi.org/10.1016/j.patcog.2017.07.013 https://doi.org/10.1016/j.patcog.2017.07.013">
M. H. Javed, Z. Yu, T. Li, T. M. Rajeh, F. Rafique, and S. Waqar, “Hybrid two-stream dynamic CNN for view adaptive human action recognition using ensemble learning,” International Journal of Machine Learning and Cybernetics, vol. 13, no. 4, pp. 1157–1166, Nov. 2021. [Online]. Available: https://doi.org/10.1007/s13042-021-01441-2 https://doi.org/10.1007/s13042-021-01441-2">
Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, “Learning clip representations for skeleton-based 3d action recognition,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2842–2855, jun 2018. [Online]. Available: https://doi.org/10.1109%2Ftip.2018.2812099 https://doi.org/10.1109%2Ftip.2018.2812099">
M. A. Khan, K. Javed, S. A. Khan, T. Saba, U. Habib, J. A. Khan, and A. A. Abbasi, “Human action recognition using fusion of multiview and deep features: an application to video surveillance,” Multimedia Tools and Applications, mar 2020. [Online]. Available: https://doi.org/10.1007%2Fs11042-020-08806-9 https://doi.org/10.1007%2Fs11042-020-08806-9">
J. Liu, G. Wang, L.-Y. Duan, K. Abdiyeva, and A. C. Kot, “Skeleton-based human action recognition with global context-aware attention LSTM networks,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1586–1599, apr 2018. [Online]. Available: https://doi.org/10.1109%2Ftip.2017.2785279 https://doi.org/10.1109%2Ftip.2017.2785279">
M. Liu, H. Liu, and C. Chen, “Enhanced skeleton visualization for view invariant human action recognition,” Pattern Recognition, vol. 68, pp. 346–362, aug 2017. [Online]. Available: https://doi.org/10.1016%2Fj.patcog.2017.02.030 https://doi.org/10.1016%2Fj.patcog.2017.02.030">
D. C. Luvizon, H. Tabia, and D. Picard, “Learning features combination for human action recognition from skeleton sequences,” Pattern Recognition Letters, vol. 99, pp. 13–20, nov 2017. [Online]. Available: https://doi.org/10.1016%2Fj.patrec.2017.02.001 https://doi.org/10.1016%2Fj.patrec.2017.02.001">
M. Ma, N. Marturi, Y. Li, A. Leonardis, and R. Stolkin, “Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos,” Pattern Recognition, vol. 76, pp. 506–521, Apr. 2018. [Online]. Available: https://doi.org/10.1016/j.patcog.2017.11.026 https://doi.org/10.1016/j.patcog.2017.11.026">
Q. Nie, J. Wang, X. Wang, and Y. Liu, “View-invariant human action recognition based on a 3d bio-constrained skeleton model,” IEEE Transactions on Image Processing, vol. 28, no. 8, pp. 3959–3972, aug 2019. [Online]. Available: https://doi.org/10.1109%2Ftip.2019.2907048 https://doi.org/10.1109%2Ftip.2019.2907048">
S. K. Park, J. H. Chung, T. K. Kang, and M. T. Lim, “Binary dense sift flow based two stream CNN for human action recognition,” Multimedia Tools and Applications, vol. 80, no. 28-29, pp. 35 697–35 720, Jun. 2021. [Online]. Available: https://doi.org/10.1007/s11042-021-10795-2 https://doi.org/10.1007/s11042-021-10795-2">
M. D. Rodriguez, J. Ahmed, and M. Shah, “Action MACH a spatio-temporal maximum average correlation height filter for action recognition,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Jun. 2008. [Online]. Available: https://doi.org/10.1109/cvpr.2008.4587727 https://doi.org/10.1109/cvpr.2008.4587727">
M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, “A database for fine grained activity detection of cooking activities,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Jun. 2012. [Online]. Available: https://doi.org/10.1109/cvpr.2012.6247801 https://doi.org/10.1109/cvpr.2012.6247801">
A. B. Sargano, X. Wang, P. Angelov, and Z. Habib, “Human action recognition using transfer learning with deep representations,” in 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, May 2017. [Online]. Available: https://doi.org/10.1109/ijcnn.2017.7965890 https://doi.org/10.1109/ijcnn.2017.7965890">
Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human action recognition from various data modalities: A review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2022. [Online]. Available: https://doi.org/10.1109/tpami.2022.3183112 https://doi.org/10.1109/tpami.2022.3183112">
M.-F. Tsai and S.-H. Huang, “Enhancing accuracy of human action recognition system using skeleton point correction method,” Multimedia Tools and Applications, vol. 81, no. 5, pp. 7439–7459, jan 2022. [Online]. Available: https://doi.org/10.1007%2Fs11042-022-12000-4 https://doi.org/10.1007%2Fs11042-022-12000-4">
J. Tu, M. Liu, and H. Liu, “Skeleton-based human action recognition using spatial temporal 3d convolutional neural networks,” in 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, jul 2018. [Online]. Available: https://doi.org/10.1109%2Ficme.2018.8486566 https://doi.org/10.1109%2Ficme.2018.8486566">
Z. Tu, W. Xie, Q. Qin, R. Poppe, R. C. Veltkamp, B. Li, and J. Yuan, “Multi-stream CNN: Learning representations based on human-related regions for action recognition,” Pattern Recognition, vol. 79, pp. 32–43, Jul. 2018. [Online]. Available: https://doi.org/10.1016/j.patcog.2018.01.020 https://doi.org/10.1016/j.patcog.2018.01.020">
A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, “Action recognition in video sequences using deep bi-directional LSTM with CNN features,” IEEE Access, vol. 6, pp. 1155–1166, 2018. [Online]. Available: https://doi.org/10.1109/access.2017.2778011 https://doi.org/10.1109/access.2017.2778011">
Y. Wang, S. Cang, and H. Yu, “A survey on wearable sensor modality centred human activity recognition in health care,” Expert Systems with Applications, vol. 137, pp. 167–190, dec 2019. [Online]. Available: https://doi.org/10.1016%2Fj.eswa.2019.04.057 https://doi.org/10.1016%2Fj.eswa.2019.04.057">
P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng, “View adaptive neural networks for high performance skeleton-based human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1963–1978, aug 2019. [Online]. Available: https://doi.org/10.1109%2Ftpami.2019.2896631 https://doi.org/10.1109%2Ftpami.2019.2896631">
How to Cite
License1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.
2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.