Open Access Open Access  Restricted Access Subscription Access




This paper presents an automatic Heart Disease (HD) prediction method based on feature selection and data mining techniques using provided symptoms and clinical information in the patient’s dataset. Data mining which allows the extraction of hidden knowledges from the data and explores the relationship between attributes, is the promising technique for HD prediction. HD symptoms can be effectively learned by the computer to classify HD into different classes. However, the information
provided may include redundant and interrelated symptoms. The use of such information may degrade the classification performance. Feature selection is an effective way to remove such noisy information
meanwhile improving the learning accuracy and facilitating a better understanding for learning model. In our method, HD attributes are re-selected based on their rank and weights assigned by Infinite Latent
Feature Selection (ILFS) method. Support Vector Machine (SVM) algorithm is applied to classify a subset of the selected attributes into different HD classes. SMOTE (Synthetic Minority Over-sampling Technique) data over-sampling technique is adopted to generate more amounts and varieties of data. The experiment is performed on the UCI Machine Learning Repository Heart Disease public dataset. Experimental results demonstrated that by only using a subset of selected 24 attributes over a total of 46 attributes, our method achieved an accuracy of 97.87% for distinguishing ‘no presence’ HD with ‘presence’ HD and an accuracy of 93.92% for distinguishing 5 different classes of HD.


Data mining; Heart Disease Prediction; Feature Selection; Classification

Full Text:



Shao, Y.E., Hou, C.D. and Chiu, C.C, “Hybrid intelligent modeling schemes for heart disease classification”, Applied Soft Computing, 14, pp.47-52, 2014.

Canlas, R. D. "Data mining in healthcare: Current applications and issues.", School of Information Systems & Management, Carnegie Mellon University, Australia, 2009.

Helma, Christoph, Eva Gottmann, and Stefan Kramer. "Knowledge discovery and data mining in toxicology.", Statistical methods in medical research, 9.4: 329-358, 2000.

Lee, I-N., S-C. Liao, and M. Embrechts. "Data mining techniques applied to medical information.", Medical informatics and the Internet in medicine, 25.2, 81-102, 2000.

Deepika, N., and K. Chandrashekar. "Association rule for classification of Heart Attack Patients.", International Journal of Advanced Engineering Science and Technologies, 11.2, 253-57, 2011.

K. Srinivas, B. Kavitha Rani and Dr. A. Govrdhan, “Application of Data Mining Techniques in Healthcare and Prediction of Heart Attacks”, International Journal on Computer Science and Engineering, Vol. 02, No. 02, pp.250 - 255, 2011.

A. Sudha, P. Gayathiri and N. Jaisankar, “Effective Analysis and Predictive Model of Stroke Disease using Classification Methods”, International Journal of Computer Applications, Vol. 43, No. 14, pp. 0975 – 8887, 2012.

Jabbar, M. A., Priti Chandra, and B. L. Deekshatulu. "Cluster based association rule mining for heart attack prediction.", Journal of Theoretical and Applied Information Technology, 32.2, 196-201, 2011.

Shouman, Mai, Tim Turner, and Rob Stocker, "Integrating decision tree and k-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients.", Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2012.

Dangare, Chaitrali S., and Sulabha S. Apte. "Improved study of heart disease prediction system using data mining classification techniques." International Journal of Computer Applications, 47.10, 44-48, 2012.

K. Usha Rani, “Analysis of heart diseases dataset using neural network approach”, International Journal of Data Mining and Knowledge Management Processive. 1, No. 5, pp. 1 - 8, 2011.

Olatubosun Olabode and Bola Titilayo Olabode, “Cerebrovascular Accident Attack Classification Using Multilayer Feed Forward Artificial Neural Network with Back Propagation Error”, Journal of Computer Science, Vol. 8, No. 1, pp.18 - 25, 2012.

M. Anbarasi, E. Anupriya and N.CH.S.N. Iyenga, “Enhanced Prediction of Heart Disease with Feature Subset Selection using Genetic Algorithm”, International Journal of Engineering Science and Technology, Vol. 2, No. 10, pp. 5370 - 5376, 2010.

Roffo, Giorgio, et al. "Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach.", arXiv preprint arXiv:1707.07538, 2017.

Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique" Journal of artificial intelligence research, 16, pp. 321-357, 2002., The contents of the heart-disease directory

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3):389–422, 2002.

Fehr D, Veeraraghavan H, Wibmer A, Gondo T, Matsumoto K, Vargas HA, Sala E, Hricak H, Deasy JO, “Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images”, Proceedings of the National Academy of Sciences, 112(46), E6265-73, 2015.

T. Hofmann, “Probabilistic latent semantic analysis”, Proceedings of the Fifteenth conference on uncertainty in artificial intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc., 1999.

G. Roffo, S. Melzi, and M. Cristani, “Infinite feature selection”, In Conf. IEEE International Conference on Computer Vision, pp. 4202–4210, 2015., The MATLAB Feature Selection Library.


Journal of Computer Science and Cybernetics ISSN: 1813-9663

Published by Vietnam Academy of Science and Technology