TAEKWONDO POSE ESTIMATION WITH DEEP LEARNING ARCHITECTURES ON ONE-DIMENSIONAL AND TWO-DIMENSIONAL DATA
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/18043Keywords:
Pose classification, Skeleton, Sports lessons, Taekwondo.Abstract
Practicing sports is an activity that helps people maintain and improve their health, enhance memory and concentration, reduce anxiety and stress, and train teamwork and leadership ability. With the development of science and technology, artificial intelligence in sports has become increasingly popular with the public and brings many benefits. In particular, many applications help people track and evaluate athletes' achievements in competitions. This study extracts images from Taekwondo videos and generates skeleton data from frames using the Fast Forward Moving Picture Experts Group (FFMPEG) technique using MoveNet. After that, we use deep learning architectures such as Long Short-Term Memory Networks, Convolutional Long Short-Term Memory, and Long-term Recurrent Convolutional Networks to perform the poses classification tasks in Taegeuk in Jang lessons. This work presents two approaches. The first approach uses a sequence skeleton extracted from the image by Movenet. Second, we use sequence images to train using video classification architecture. Finally, we recognize poses in sports lessons using skeleton data to remove noise in the image, such as background and extraneous objects behind the exerciser. As a result, our proposed method has achieved promising performance in pose classification tasks in an introductory Taekwondo lesson.
Metrics
References
S. Alghyaline, J.-W. Hsieh, and C.-H. Chuang, Video action classication using symmelets
and deep learning, in 2017 IEEE International Conference on Systems, Man, and Cybernetics
(SMC). IEEE, Oct. 2017. [Online]. Available: https://doi.org/10.1109/smc.2017.8122640
J. Arunnehru, G. Chamundeeswari, and S. P. Bharathi, Human action recognition
using 3d convolutional neural networks with 3d motion cuboids in surveillance videos,
Procedia Computer Science, vol. 133, pp. 471477, 2018. [Online]. Available: https:
//doi.org/10.1016/j.procs.2018.07.059
https://github.com/thnguyencit/pose-classication
https://www.csc.kth.se/cvap/actions/TAEKWONDO POSE ESTIMATION 365
G. Batchuluun, J. K. Kang, D. T. Nguyen, T. D. Pham, M. Arsalan, and K. R. Park, Action
recognition from thermal videos using joint and skeleton information, IEEE Access, vol. 9, pp.
71611 733, 2021. [Online]. Available: https://doi.org/10.1109/access.2021.3051375
M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes,
in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. IEEE,
[Online]. Available: https://doi.org/10.1109/iccv.2005.28
S. Chaudhary and S. Murala, Deep network for human action recognition using
weber motion, Neurocomputing, vol. 367, pp. 207216, Nov. 2019. [Online]. Available:
https://doi.org/10.1016/j.neucom.2019.08.031
J. Chen, R. D. J. Samuel, and P. Poovendran, LSTM with bio inspired algorithm for action
recognition in sports videos, Image and Vision Computing, vol. 112, p. 104214, Aug. 2021.
[Online]. Available: https://doi.org/10.1016/j.imavis.2021.104214
F. Cruciani, A. Vafeiadis, C. Nugent, I. Cleland, P. McCullagh, K. Votis, D. Giakoumis,
D. Tzovaras, L. Chen, and R. Hamzaoui, Feature learning for human activity
recognition using convolutional neural networks, CCF Transactions on Pervasive Computing
and Interaction, vol. 2, no. 1, pp. 1832, Jan. 2020. [Online]. Available: https:
//doi.org/10.1007/s42486-020-00026-2
P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior recognition via sparse
spatio-temporal features, in 2005 IEEE International Workshop on Visual Surveillance
and Performance Evaluation of Tracking and Surveillance. IEEE, 2005. [Online]. Available:
https://doi.org/10.1109/vspets.2005.1570899
Y. Du, W. Wang, and L. Wang, Hierarchical recurrent neural network for skeleton based
action recognition, in 2015 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). IEEE, jun 2015. [Online]. Available: https://doi.org/10.1109/cvpr.2015.7298714
R. O. García, E. F. Morales, and L. E. Sucar, Second-order motion descriptors for ecient
action recognition, Pattern Analysis and Applications, vol. 24, no. 2, pp. 473482, Oct. 2020.
[Online]. Available: https://doi.org/10.1007/s10044-020-00924-2
Y. Hbali, S. Hbali, L. Ballihi, and M. Sadgal, Skeleton-based human activity recognition
for elderly monitoring systems, IET Computer Vision, vol. 12, no. 1, pp. 1626, nov 2017.
[Online]. Available: https://doi.org/10.1049/iet-cvi.2017.0062
H. T. T. Hoang, C. N. Ha, D. T. Nguyen, T. N. Nguyen, T. N. Huynh, T. T.
Phan, and H. T. Nguyen, Poses classication in a taekwondo lesson using skeleton
data extracted from videos with shallow and deep learning architectures, in Future Data
and Security Engineering. Big Data, Security and Privacy, Smart City and Industry
0 Applications. Springer Nature Singapore, 2022, pp. 447461. [Online]. Available:
https://doi.org/10.1007/978-981-19-8069-5_30
E. P. Ijjina and K. M. Chalavadi, Human action recognition in RGB-d videos using motion
sequence information and deep learning, Pattern Recognition, vol. 72, pp. 504516, Dec. 2017.
[Online]. Available: https://doi.org/10.1016/j.patcog.2017.07.013
N. Jaouedi, N. Boujnah, and M. S. Bouhlel, A new hybrid deep learning model for human action
recognition, Journal of King Saud University - Computer and Information Sciences, vol. 32,
no. 4, pp. 447453, May 2020. [Online]. Available: https://doi.org/10.1016/j.jksuci.2019.09.004366 DAT TIEN NGUYEN, et al.
M. H. Javed, Z. Yu, T. Li, T. M. Rajeh, F. Raque, and S. Waqar, Hybrid two-stream dynamic
CNN for view adaptive human action recognition using ensemble learning, International
Journal of Machine Learning and Cybernetics, vol. 13, no. 4, pp. 11571166, Nov. 2021.
[Online]. Available: https://doi.org/10.1007/s13042-021-01441-2
S. Ji, W. Xu, M. Yang, and K. Yu, 3d convolutional neural networks for human action recognition, in Proceedings of the 27th International Conference on International Conference on
Machine Learning, ser. ICML'10. Madison, WI, USA: Omnipress, 2010, p. 495502.
Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, Learning clip representations for
skeleton-based 3d action recognition, IEEE Transactions on Image Processing, vol. 27, no. 6,
pp. 28422855, jun 2018. [Online]. Available: https://doi.org/10.1109/tip.2018.2812099
M. A. Khan, K. Javed, S. A. Khan, T. Saba, U. Habib, J. A. Khan, and A. A. Abbasi,
Human action recognition using fusion of multiview and deep features: an application
to video surveillance, Multimedia Tools and Applications, mar 2020. [Online]. Available:
https://doi.org/10.1007/s11042-020-08806-9
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database
for human motion recognition, in 2011 International Conference on Computer Vision. IEEE,
Nov. 2011. [Online]. Available: https://doi.org/10.1109/iccv.2011.6126543
Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, Learning hierarchical invariant
spatio-temporal features for action recognition with independent subspace analysis, in CVPR
IEEE, Jun. 2011. [Online]. Available: https://doi.org/10.1109/cvpr.2011.5995496
C. Li, P. Wang, S. Wang, Y. Hou, and W. Li, Skeleton-based action recognition using LSTM
and CNN, in 2017 IEEE International Conference on Multimedia Expo Workshops (ICMEW).
IEEE, jul 2017. [Online]. Available: https://doi.org/10.1109/icmew.2017.8026287
J. Liu, G. Wang, L.-Y. Duan, K. Abdiyeva, and A. C. Kot, Skeleton-based human
action recognition with global context-aware attention LSTM networks, IEEE Transactions
on Image Processing, vol. 27, no. 4, pp. 15861599, apr 2018. [Online]. Available:
https://doi.org/10.1109/tip.2017.2785279
M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human
action recognition, Pattern Recognition, vol. 68, pp. 346362, aug 2017. [Online]. Available:
https://doi.org/10.1016/j.patcog.2017.02.030
Z. Liu, X. Zhang, L. Song, Z. Ding, and H. Duan, More ecient and eective tricks for
deep action recognition, Cluster Computing, vol. 22, no. S1, pp. 819826, Nov. 2017. [Online].
Available: https://doi.org/10.1007/s10586-017-1309-2
D. C. Luvizon, H. Tabia, and D. Picard, Learning features combination for human action
recognition from skeleton sequences, Pattern Recognition Letters, vol. 99, pp. 1320, nov 2017.
[Online]. Available: https://doi.org/10.1016/j.patrec.2017.02.001
M. Ma, N. Marturi, Y. Li, A. Leonardis, and R. Stolkin, Region-sequence based
six-stream CNN features for general and ne-grained human action recognition in
videos, Pattern Recognition, vol. 76, pp. 506521, Apr. 2018. [Online]. Available:
https://doi.org/10.1016/j.patcog.2017.11.026
Q. Nie, J. Wang, X. Wang, and Y. Liu, View-invariant human action recognition based on a
d bio-constrained skeleton model, IEEE Transactions on Image Processing, vol. 28, no. 8, pp.
3972, aug 2019. [Online]. Available: https://doi.org/10.1109/tip.2019.2907048TAEKWONDO POSE ESTIMATION 367
J. C. Niebles, H. Wang, H. Wang, and L. Fei-Fei, Unsupervised learning of human action
categories using spatial-temporal words, in Procedings of the British Machine Vision Conference
British Machine Vision Association, 2006.
S. K. Park, J. H. Chung, T. K. Kang, and M. T. Lim, Binary dense sift ow based two stream
CNN for human action recognition, Multimedia Tools and Applications, vol. 80, no. 28-29, pp.
69735 720, Jun. 2021. [Online]. Available: https://doi.org/10.1007/s11042-021-10795-2
M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal maximum
average correlation height lter for action recognition, in 2008 IEEE Conference
on Computer Vision and Pattern Recognition. IEEE, Jun. 2008. [Online]. Available:
https://doi.org/10.1109/cvpr.2008.4587727
M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, A database for ne grained activity
detection of cooking activities, in 2012 IEEE Conference on Computer Vision and Pattern
Recognition. IEEE, Jun. 2012. [Online]. Available: https://doi.org/10.1109/cvpr.2012.6247801
A. B. Sargano, X. Wang, P. Angelov, and Z. Habib, Human action recognition using transfer
learning with deep representations, in 2017 International Joint Conference on Neural Networks
(IJCNN). IEEE, May 2017. [Online]. Available: https://doi.org/10.1109/ijcnn.2017.7965890
C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach,
in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.
IEEE, 2004. [Online]. Available: https://doi.org/10.1109/icpr.2004.1334462
J. Shuiwang, X. Wei, Y. Ming, and Y. Kai, 3d convolutional neural networks for human action
recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1,
pp. 221231, Jan. 2013. [Online]. Available: https://doi.org/10.1109/tpami.2012.59
Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, Human action recognition
from various data modalities: A review, IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 120, 2022. [Online]. Available: https://doi.org/10.1109/tpami.2022.3183112
M.-F. Tsai and S.-H. Huang, Enhancing accuracy of human action recognition system using
skeleton point correction method, Multimedia Tools and Applications, vol. 81, no. 5, pp.
7459, jan 2022. [Online]. Available: https://doi.org/10.1007/s11042-022-12000-4
J. Tu, M. Liu, and H. Liu, Skeleton-based human action recognition using spatial temporal
d convolutional neural networks, in 2018 IEEE International Conference on Multimedia and
Expo (ICME). IEEE, jul 2018. [Online]. Available: https://doi.org/10.1109/icme.2018.8486566
Z. Tu, W. Xie, Q. Qin, R. Poppe, R. C. Veltkamp, B. Li, and J. Yuan,
Multi-stream CNN: Learning representations based on human-related regions for action
recognition, Pattern Recognition, vol. 79, pp. 3243, Jul. 2018. [Online]. Available:
https://doi.org/10.1016/j.patcog.2018.01.020
A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, Action recognition in
video sequences using deep bi-directional LSTM with CNN features, IEEE Access, vol. 6, pp.
1166, 2018. [Online]. Available: https://doi.org/10.1109/access.2017.2778011
S. Vishwakarma and A. Agrawal, A survey on activity recognition and behavior understanding
in video surveillance, The Visual Computer, vol. 29, no. 10, pp. 9831009, sep 2012. [Online].
Available: https://doi.org/10.1007/s00371-012-0752-6368
Y. Wang, S. Cang, and H. Yu, A survey on wearable sensor modality centred human activity
recognition in health care, Expert Systems with Applications, vol. 137, pp. 167190, dec 2019.
[Online]. Available: https://doi.org/10.1016/j.eswa.2019.04.057
P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng, View adaptive neural networks
for high performance skeleton-based human action recognition, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 41, no. 8, pp. 19631978, aug 2019. [Online]. Available:
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.