Dat Tien Nguyen, Chau Ngoc Ha, Ha Thanh Thi Hoang, Truong Nhat Nguyen, Tuyet Ngoc Huynh, Hai Thanh Nguyen
Author affiliations


  • Dat Tien Nguyen College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Chau Ngoc Ha College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Ha Thanh Thi Hoang College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Truong Nhat Nguyen College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Tuyet Ngoc Huynh College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam
  • Hai Thanh Nguyen College of Information and Communication Technology, Can Tho University, Can Tho, Viet Nam




Pose classification, Skeleton, Sports lessons, Taekwondo.


Practicing sports is an activity that helps people maintain and improve their health, enhance memory and concentration, reduce anxiety and stress, and train teamwork and leadership ability. With the development of science and technology, artificial intelligence in sports has become increasingly popular with the public and brings many benefits. In particular, many applications help people track and evaluate athletes' achievements in competitions. This study extracts images from Taekwondo videos and generates skeleton data from frames using the Fast Forward Moving Picture Experts Group (FFMPEG) technique using MoveNet. After that, we use deep learning architectures such as Long Short-Term Memory Networks, Convolutional Long Short-Term Memory, and Long-term Recurrent Convolutional Networks to perform the poses classification tasks in Taegeuk in Jang lessons. This work presents two approaches. The first approach uses a sequence skeleton extracted from the image by Movenet. Second, we use sequence images to train using video classification architecture. Finally, we recognize poses in sports lessons using skeleton data to remove noise in the image, such as background and extraneous objects behind the exerciser. As a result, our proposed method has achieved promising performance in pose classification tasks in an introductory Taekwondo lesson.


S. Alghyaline, J.-W. Hsieh, and C.-H. Chuang, “Video action classification using symmelets and deep learning,” Oct. 2017. [Online]. Available: https://doi.org/10.1109/smc.2017.8122640 https://doi.org/10.1109/smc.2017.8122640">

J. Arunnehru, G. Chamundeeswari, and S. P. Bharathi, “Human action recognition using 3d convolutional neural networks with 3d motion cuboids in surveillance videos,” Procedia Computer Science, vol. 133, pp. 471–477, 2018. [Online]. Available: https:


G. Batchuluun, J. K. Kang, D. T. Nguyen, T. D. Pham, M. Arsalan, and K. R. Park, “Action recognition from thermal videos using joint and skeleton information,” IEEE Access, vol. 9, pp. 11 716–11 733, 2021. [Online]. Available: https://doi.org/10.1109%2Faccess.2021.3051375 https://doi.org/10.1109%2Faccess.2021.3051375">

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. IEEE, 2005. [Online]. Available: https://doi.org/10.1109/iccv.2005.28 https://doi.org/10.1109/iccv.2005.28">

S. Chaudhary and S. Murala, “Deep network for human action recognition using weber motion,” Neurocomputing, vol. 367, pp. 207–216, Nov. 2019. [Online]. Available: https://doi.org/10.1016/j.neucom.2019.08.031 https://doi.org/10.1016/j.neucom.2019.08.031">

J. Chen, R. D. J. Samuel, and P. Poovendran, “LSTM with bio inspired algorithm for action recognition in sports videos,” Image and Vision Computing, vol. 112, p. 104214, Aug. 2021. [Online]. Available: https://doi.org/10.1016/j.imavis.2021.104214 https://doi.org/10.1016/j.imavis.2021.104214">

F. Cruciani, A. Vafeiadis, C. Nugent, I. Cleland, P. McCullagh, K. Votis, D. Giakoumis,

D. Tzovaras, L. Chen, and R. Hamzaoui, “Feature learning for human activity recognition using convolutional neural networks,” CCF Transactions on Pervasive Computing and Interaction, vol. 2, no. 1, pp. 18–32, Jan. 2020. [Online]. Available: https:


Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2015. [Online]. Available: https://doi.org/10.1109%2Fcvpr.2015.7298714 https://doi.org/10.1109%2Fcvpr.2015.7298714">

R. O. García, E. F. Morales, and L. E. Sucar, “Second-order motion descriptors for efficient action recognition,” Pattern Analysis and Applications, vol. 24, no. 2, pp. 473–482, Oct. 2020. [Online]. Available: https://doi.org/10.1007/s10044-020-00924-2 https://doi.org/10.1007/s10044-020-00924-2">

Y. Hbali, S. Hbali, L. Ballihi, and M. Sadgal, “Skeleton-based human activity recognition for elderly monitoring systems,” IET Computer Vision, vol. 12, no. 1, pp. 16–26, nov 2017. [Online]. Available: https://doi.org/10.1049%2Fiet-cvi.2017.0062 https://doi.org/10.1049%2Fiet-cvi.2017.0062">

H. T. T. Hoang, C. N. Ha, D. T. Nguyen, T. N. Nguyen, T. N. Huynh, T. T. Phan, and H. T. Nguyen, “Poses classification in a taekwondo lesson using skeleton data extracted from videos with shallow and deep learning architectures,” pp. 447–461, 2022. [Online].

Available: https://doi.org/10.1007/978-981-19-8069-5_30 https://doi.org/10.1007/978-981-19-8069-5_30">

E. P. Ijjina and K. M. Chalavadi, “Human action recognition in RGB-d videos using motion sequence information and deep learning,” Pattern Recognition, vol. 72, pp. 504–516, Dec. 2017. [Online]. Available: https://doi.org/10.1016/j.patcog.2017.07.013 https://doi.org/10.1016/j.patcog.2017.07.013">

M. H. Javed, Z. Yu, T. Li, T. M. Rajeh, F. Rafique, and S. Waqar, “Hybrid two-stream dynamic CNN for view adaptive human action recognition using ensemble learning,” International Journal of Machine Learning and Cybernetics, vol. 13, no. 4, pp. 1157–1166, Nov. 2021. [Online]. Available: https://doi.org/10.1007/s13042-021-01441-2 https://doi.org/10.1007/s13042-021-01441-2">

Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, “Learning clip representations for skeleton-based 3d action recognition,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2842–2855, jun 2018. [Online]. Available: https://doi.org/10.1109%2Ftip.2018.2812099 https://doi.org/10.1109%2Ftip.2018.2812099">

M. A. Khan, K. Javed, S. A. Khan, T. Saba, U. Habib, J. A. Khan, and A. A. Abbasi, “Human action recognition using fusion of multiview and deep features: an application to video surveillance,” Multimedia Tools and Applications, mar 2020. [Online]. Available: https://doi.org/10.1007%2Fs11042-020-08806-9 https://doi.org/10.1007%2Fs11042-020-08806-9">

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: A large video database for human motion recognition,” in 2011 International Conference on Computer Vision. IEEE,

Nov. 2011. [Online]. Available: https://doi.org/10.1109/iccv.2011.6126543 https://doi.org/10.1109/iccv.2011.6126543">

C. Li, P. Wang, S. Wang, Y. Hou, and W. Li, “Skeleton-based action recognition using LSTM and CNN,” in 2017 IEEE International Conference on Multimedia Expo Workshops (ICMEW).

IEEE, jul 2017. [Online]. Available: https://doi.org/10.1109%2Ficmew.2017.8026287 https://doi.org/10.1109%2Ficmew.2017.8026287">

J. Liu, G. Wang, L.-Y. Duan, K. Abdiyeva, and A. C. Kot, “Skeleton-based human action recognition with global context-aware attention LSTM networks,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1586–1599, apr 2018. [Online]. Available: https://doi.org/10.1109%2Ftip.2017.2785279 https://doi.org/10.1109%2Ftip.2017.2785279">

M. Liu, H. Liu, and C. Chen, “Enhanced skeleton visualization for view invariant human action recognition,” Pattern Recognition, vol. 68, pp. 346–362, aug 2017. [Online]. Available: https://doi.org/10.1016%2Fj.patcog.2017.02.030 https://doi.org/10.1016%2Fj.patcog.2017.02.030">

Z. Liu, X. Zhang, L. Song, Z. Ding, and H. Duan, “More efficient and effective tricks for deep action recognition,” Cluster Computing, vol. 22, no. S1, pp. 819–826, Nov. 2017. [Online].

Available: https://doi.org/10.1007/s10586-017-1309-2 https://doi.org/10.1007/s10586-017-1309-2">

D. C. Luvizon, H. Tabia, and D. Picard, “Learning features combination for human action recognition from skeleton sequences,” Pattern Recognition Letters, vol. 99, pp. 13–20, nov 2017. [Online]. Available: https://doi.org/10.1016%2Fj.patrec.2017.02.001 https://doi.org/10.1016%2Fj.patrec.2017.02.001">

M. Ma, N. Marturi, Y. Li, A. Leonardis, and R. Stolkin, “Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos,” Pattern Recognition, vol. 76, pp. 506–521, Apr. 2018. [Online]. Available: https://doi.org/10.1016/j.patcog.2017.11.026 https://doi.org/10.1016/j.patcog.2017.11.026">

Q. Nie, J. Wang, X. Wang, and Y. Liu, “View-invariant human action recognition based on a 3d bio-constrained skeleton model,” IEEE Transactions on Image Processing, vol. 28, no. 8, pp. 3959–3972, aug 2019. [Online]. Available: https://doi.org/10.1109%2Ftip.2019.2907048 https://doi.org/10.1109%2Ftip.2019.2907048">

S. K. Park, J. H. Chung, T. K. Kang, and M. T. Lim, “Binary dense sift flow based two stream CNN for human action recognition,” Multimedia Tools and Applications, vol. 80, no. 28-29, pp. 35 697–35 720, Jun. 2021. [Online]. Available: https://doi.org/10.1007/s11042-021-10795-2 https://doi.org/10.1007/s11042-021-10795-2">

M. D. Rodriguez, J. Ahmed, and M. Shah, “Action MACH a spatio-temporal maximum average correlation height filter for action recognition,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Jun. 2008. [Online]. Available: https://doi.org/10.1109/cvpr.2008.4587727 https://doi.org/10.1109/cvpr.2008.4587727">

M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, “A database for fine grained activity detection of cooking activities,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Jun. 2012. [Online]. Available: https://doi.org/10.1109/cvpr.2012.6247801 https://doi.org/10.1109/cvpr.2012.6247801">

A. B. Sargano, X. Wang, P. Angelov, and Z. Habib, “Human action recognition using transfer learning with deep representations,” in 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, May 2017. [Online]. Available: https://doi.org/10.1109/ijcnn.2017.7965890 https://doi.org/10.1109/ijcnn.2017.7965890">

Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human action recognition from various data modalities: A review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2022. [Online]. Available: https://doi.org/10.1109/tpami.2022.3183112 https://doi.org/10.1109/tpami.2022.3183112">

M.-F. Tsai and S.-H. Huang, “Enhancing accuracy of human action recognition system using skeleton point correction method,” Multimedia Tools and Applications, vol. 81, no. 5, pp. 7439–7459, jan 2022. [Online]. Available: https://doi.org/10.1007%2Fs11042-022-12000-4 https://doi.org/10.1007%2Fs11042-022-12000-4">

J. Tu, M. Liu, and H. Liu, “Skeleton-based human action recognition using spatial temporal 3d convolutional neural networks,” in 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, jul 2018. [Online]. Available: https://doi.org/10.1109%2Ficme.2018.8486566 https://doi.org/10.1109%2Ficme.2018.8486566">

Z. Tu, W. Xie, Q. Qin, R. Poppe, R. C. Veltkamp, B. Li, and J. Yuan, “Multi-stream CNN: Learning representations based on human-related regions for action recognition,” Pattern Recognition, vol. 79, pp. 32–43, Jul. 2018. [Online]. Available: https://doi.org/10.1016/j.patcog.2018.01.020 https://doi.org/10.1016/j.patcog.2018.01.020">

A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, “Action recognition in video sequences using deep bi-directional LSTM with CNN features,” IEEE Access, vol. 6, pp. 1155–1166, 2018. [Online]. Available: https://doi.org/10.1109/access.2017.2778011 https://doi.org/10.1109/access.2017.2778011">

S. Vishwakarma and A. Agrawal, “A survey on activity recognition and behavior understanding in video surveillance,” The Visual Computer, vol. 29, no. 10, pp. 983–1009, sep 2012. [Online].

Available: https://doi.org/10.1007%2Fs00371-012-0752-6 https://doi.org/10.1007%2Fs00371-012-0752-6">

Y. Wang, S. Cang, and H. Yu, “A survey on wearable sensor modality centred human activity recognition in health care,” Expert Systems with Applications, vol. 137, pp. 167–190, dec 2019. [Online]. Available: https://doi.org/10.1016%2Fj.eswa.2019.04.057 https://doi.org/10.1016%2Fj.eswa.2019.04.057">

P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng, “View adaptive neural networks for high performance skeleton-based human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1963–1978, aug 2019. [Online]. Available: https://doi.org/10.1109%2Ftpami.2019.2896631 https://doi.org/10.1109%2Ftpami.2019.2896631">




How to Cite