Dat Tien Nguyen, Chau Ngoc Ha, Ha Thanh Thi Hoang, Truong Nhat Nguyen, Tuyet Ngoc Huynh, Hai Thanh Nguyen
Author affiliations


  • Dat Tien Nguyen College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Chau Ngoc Ha College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Ha Thanh Thi Hoang College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Truong Nhat Nguyen College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Tuyet Ngoc Huynh College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Hai Thanh Nguyen College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam




Pose classification, Skeleton, Sports lessons, Taekwondo.


Practicing sports is an activity that helps people maintain and improve their health, enhance memory and concentration, reduce anxiety and stress, and train teamwork and leadership ability. With the development of science and technology, artificial intelligence in sports has become increasingly popular with the public and brings many benefits. In particular, many applications help people track and evaluate athletes' achievements in competitions. This study extracts images from Taekwondo videos and generates skeleton data from frames using the Fast Forward Moving Picture Experts Group (FFMPEG) technique using MoveNet. After that, we use deep learning architectures such as Long Short-Term Memory Networks, Convolutional Long Short-Term Memory, and Long-term Recurrent Convolutional Networks to perform the poses classification tasks in Taegeuk in Jang lessons. This work presents two approaches. The first approach uses a sequence skeleton extracted from the image by Movenet. Second, we use sequence images to train using video classification architecture. Finally, we recognize poses in sports lessons using skeleton data to remove noise in the image, such as background and extraneous objects behind the exerciser. As a result, our proposed method has achieved promising performance in pose classification tasks in an introductory Taekwondo lesson.


Metrics Loading ...


S. Alghyaline, J.-W. Hsieh, and C.-H. Chuang, Video action classication using symmelets

and deep learning, in 2017 IEEE International Conference on Systems, Man, and Cybernetics

(SMC). IEEE, Oct. 2017. [Online]. Available: https://doi.org/10.1109/smc.2017.8122640

J. Arunnehru, G. Chamundeeswari, and S. P. Bharathi, Human action recognition

using 3d convolutional neural networks with 3d motion cuboids in surveillance videos,

Procedia Computer Science, vol. 133, pp. 471477, 2018. [Online]. Available: https:



https://www.csc.kth.se/cvap/actions/TAEKWONDO POSE ESTIMATION 365

G. Batchuluun, J. K. Kang, D. T. Nguyen, T. D. Pham, M. Arsalan, and K. R. Park, Action

recognition from thermal videos using joint and skeleton information, IEEE Access, vol. 9, pp.

71611 733, 2021. [Online]. Available: https://doi.org/10.1109/access.2021.3051375

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes,

in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. IEEE,

[Online]. Available: https://doi.org/10.1109/iccv.2005.28

S. Chaudhary and S. Murala, Deep network for human action recognition using

weber motion, Neurocomputing, vol. 367, pp. 207216, Nov. 2019. [Online]. Available:


J. Chen, R. D. J. Samuel, and P. Poovendran, LSTM with bio inspired algorithm for action

recognition in sports videos, Image and Vision Computing, vol. 112, p. 104214, Aug. 2021.

[Online]. Available: https://doi.org/10.1016/j.imavis.2021.104214

F. Cruciani, A. Vafeiadis, C. Nugent, I. Cleland, P. McCullagh, K. Votis, D. Giakoumis,

D. Tzovaras, L. Chen, and R. Hamzaoui, Feature learning for human activity

recognition using convolutional neural networks, CCF Transactions on Pervasive Computing

and Interaction, vol. 2, no. 1, pp. 1832, Jan. 2020. [Online]. Available: https:


P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior recognition via sparse

spatio-temporal features, in 2005 IEEE International Workshop on Visual Surveillance

and Performance Evaluation of Tracking and Surveillance. IEEE, 2005. [Online]. Available:


Y. Du, W. Wang, and L. Wang, Hierarchical recurrent neural network for skeleton based

action recognition, in 2015 IEEE Conference on Computer Vision and Pattern Recognition

(CVPR). IEEE, jun 2015. [Online]. Available: https://doi.org/10.1109/cvpr.2015.7298714

R. O. García, E. F. Morales, and L. E. Sucar, Second-order motion descriptors for ecient

action recognition, Pattern Analysis and Applications, vol. 24, no. 2, pp. 473482, Oct. 2020.

[Online]. Available: https://doi.org/10.1007/s10044-020-00924-2

Y. Hbali, S. Hbali, L. Ballihi, and M. Sadgal, Skeleton-based human activity recognition

for elderly monitoring systems, IET Computer Vision, vol. 12, no. 1, pp. 1626, nov 2017.

[Online]. Available: https://doi.org/10.1049/iet-cvi.2017.0062

H. T. T. Hoang, C. N. Ha, D. T. Nguyen, T. N. Nguyen, T. N. Huynh, T. T.

Phan, and H. T. Nguyen, Poses classication in a taekwondo lesson using skeleton

data extracted from videos with shallow and deep learning architectures, in Future Data

and Security Engineering. Big Data, Security and Privacy, Smart City and Industry

0 Applications. Springer Nature Singapore, 2022, pp. 447461. [Online]. Available:


E. P. Ijjina and K. M. Chalavadi, Human action recognition in RGB-d videos using motion

sequence information and deep learning, Pattern Recognition, vol. 72, pp. 504516, Dec. 2017.

[Online]. Available: https://doi.org/10.1016/j.patcog.2017.07.013

N. Jaouedi, N. Boujnah, and M. S. Bouhlel, A new hybrid deep learning model for human action

recognition, Journal of King Saud University - Computer and Information Sciences, vol. 32,

no. 4, pp. 447453, May 2020. [Online]. Available: https://doi.org/10.1016/j.jksuci.2019.09.004366 DAT TIEN NGUYEN, et al.

M. H. Javed, Z. Yu, T. Li, T. M. Rajeh, F. Raque, and S. Waqar, Hybrid two-stream dynamic

CNN for view adaptive human action recognition using ensemble learning, International

Journal of Machine Learning and Cybernetics, vol. 13, no. 4, pp. 11571166, Nov. 2021.

[Online]. Available: https://doi.org/10.1007/s13042-021-01441-2

S. Ji, W. Xu, M. Yang, and K. Yu, 3d convolutional neural networks for human action recognition, in Proceedings of the 27th International Conference on International Conference on

Machine Learning, ser. ICML'10. Madison, WI, USA: Omnipress, 2010, p. 495502.

Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, Learning clip representations for

skeleton-based 3d action recognition, IEEE Transactions on Image Processing, vol. 27, no. 6,

pp. 28422855, jun 2018. [Online]. Available: https://doi.org/10.1109/tip.2018.2812099

M. A. Khan, K. Javed, S. A. Khan, T. Saba, U. Habib, J. A. Khan, and A. A. Abbasi,

Human action recognition using fusion of multiview and deep features: an application

to video surveillance, Multimedia Tools and Applications, mar 2020. [Online]. Available:


H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database

for human motion recognition, in 2011 International Conference on Computer Vision. IEEE,

Nov. 2011. [Online]. Available: https://doi.org/10.1109/iccv.2011.6126543

Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, Learning hierarchical invariant

spatio-temporal features for action recognition with independent subspace analysis, in CVPR

IEEE, Jun. 2011. [Online]. Available: https://doi.org/10.1109/cvpr.2011.5995496

C. Li, P. Wang, S. Wang, Y. Hou, and W. Li, Skeleton-based action recognition using LSTM

and CNN, in 2017 IEEE International Conference on Multimedia Expo Workshops (ICMEW).

IEEE, jul 2017. [Online]. Available: https://doi.org/10.1109/icmew.2017.8026287

J. Liu, G. Wang, L.-Y. Duan, K. Abdiyeva, and A. C. Kot, Skeleton-based human

action recognition with global context-aware attention LSTM networks, IEEE Transactions

on Image Processing, vol. 27, no. 4, pp. 15861599, apr 2018. [Online]. Available:


M. Liu, H. Liu, and C. Chen, Enhanced skeleton visualization for view invariant human

action recognition, Pattern Recognition, vol. 68, pp. 346362, aug 2017. [Online]. Available:


Z. Liu, X. Zhang, L. Song, Z. Ding, and H. Duan, More ecient and eective tricks for

deep action recognition, Cluster Computing, vol. 22, no. S1, pp. 819826, Nov. 2017. [Online].

Available: https://doi.org/10.1007/s10586-017-1309-2

D. C. Luvizon, H. Tabia, and D. Picard, Learning features combination for human action

recognition from skeleton sequences, Pattern Recognition Letters, vol. 99, pp. 1320, nov 2017.

[Online]. Available: https://doi.org/10.1016/j.patrec.2017.02.001

M. Ma, N. Marturi, Y. Li, A. Leonardis, and R. Stolkin, Region-sequence based

six-stream CNN features for general and ne-grained human action recognition in

videos, Pattern Recognition, vol. 76, pp. 506521, Apr. 2018. [Online]. Available:


Q. Nie, J. Wang, X. Wang, and Y. Liu, View-invariant human action recognition based on a

d bio-constrained skeleton model, IEEE Transactions on Image Processing, vol. 28, no. 8, pp.

3972, aug 2019. [Online]. Available: https://doi.org/10.1109/tip.2019.2907048TAEKWONDO POSE ESTIMATION 367

J. C. Niebles, H. Wang, H. Wang, and L. Fei-Fei, Unsupervised learning of human action

categories using spatial-temporal words, in Procedings of the British Machine Vision Conference

British Machine Vision Association, 2006.

S. K. Park, J. H. Chung, T. K. Kang, and M. T. Lim, Binary dense sift ow based two stream

CNN for human action recognition, Multimedia Tools and Applications, vol. 80, no. 28-29, pp.

69735 720, Jun. 2021. [Online]. Available: https://doi.org/10.1007/s11042-021-10795-2

M. D. Rodriguez, J. Ahmed, and M. Shah, Action MACH a spatio-temporal maximum

average correlation height lter for action recognition, in 2008 IEEE Conference

on Computer Vision and Pattern Recognition. IEEE, Jun. 2008. [Online]. Available:


M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, A database for ne grained activity

detection of cooking activities, in 2012 IEEE Conference on Computer Vision and Pattern

Recognition. IEEE, Jun. 2012. [Online]. Available: https://doi.org/10.1109/cvpr.2012.6247801

A. B. Sargano, X. Wang, P. Angelov, and Z. Habib, Human action recognition using transfer

learning with deep representations, in 2017 International Joint Conference on Neural Networks

(IJCNN). IEEE, May 2017. [Online]. Available: https://doi.org/10.1109/ijcnn.2017.7965890

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach,

in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004.

IEEE, 2004. [Online]. Available: https://doi.org/10.1109/icpr.2004.1334462

J. Shuiwang, X. Wei, Y. Ming, and Y. Kai, 3d convolutional neural networks for human action

recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1,

pp. 221231, Jan. 2013. [Online]. Available: https://doi.org/10.1109/tpami.2012.59

Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, Human action recognition

from various data modalities: A review, IEEE Transactions on Pattern Analysis and Machine

Intelligence, pp. 120, 2022. [Online]. Available: https://doi.org/10.1109/tpami.2022.3183112

M.-F. Tsai and S.-H. Huang, Enhancing accuracy of human action recognition system using

skeleton point correction method, Multimedia Tools and Applications, vol. 81, no. 5, pp.

7459, jan 2022. [Online]. Available: https://doi.org/10.1007/s11042-022-12000-4

J. Tu, M. Liu, and H. Liu, Skeleton-based human action recognition using spatial temporal

d convolutional neural networks, in 2018 IEEE International Conference on Multimedia and

Expo (ICME). IEEE, jul 2018. [Online]. Available: https://doi.org/10.1109/icme.2018.8486566

Z. Tu, W. Xie, Q. Qin, R. Poppe, R. C. Veltkamp, B. Li, and J. Yuan,

Multi-stream CNN: Learning representations based on human-related regions for action

recognition, Pattern Recognition, vol. 79, pp. 3243, Jul. 2018. [Online]. Available:


A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, Action recognition in

video sequences using deep bi-directional LSTM with CNN features, IEEE Access, vol. 6, pp.

1166, 2018. [Online]. Available: https://doi.org/10.1109/access.2017.2778011

S. Vishwakarma and A. Agrawal, A survey on activity recognition and behavior understanding

in video surveillance, The Visual Computer, vol. 29, no. 10, pp. 9831009, sep 2012. [Online].

Available: https://doi.org/10.1007/s00371-012-0752-6368

Y. Wang, S. Cang, and H. Yu, A survey on wearable sensor modality centred human activity

recognition in health care, Expert Systems with Applications, vol. 137, pp. 167190, dec 2019.

[Online]. Available: https://doi.org/10.1016/j.eswa.2019.04.057

P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng, View adaptive neural networks

for high performance skeleton-based human action recognition, IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 41, no. 8, pp. 19631978, aug 2019. [Online]. Available:





How to Cite

D. T. Nguyen, C. N. Ha, H. T. T. Hoang, T. N. Nguyen, T. N. Huynh, and H. T. Nguyen, “TAEKWONDO POSE ESTIMATION WITH DEEP LEARNING ARCHITECTURES ON ONE-DIMENSIONAL AND TWO-DIMENSIONAL DATA”, JCC, vol. 39, no. 4, p. 343–368, Nov. 2023.