AUTOMATIC HEART DISEASE PREDICTION USING FEATURE SELECTION AND DATA MINING TECHNIQUE
Author affiliations
DOI:
https://doi.org/10.15625/1813-9663/34/1/12665Keywords:
Data mining, Heart Disease Prediction, Feature Selection, ClassificationAbstract
This paper presents an automatic Heart Disease (HD) prediction method based on feature selection and data mining techniques using provided symptoms and clinical information in the patient’s dataset. Data mining which allows the extraction of hidden knowledges from the data and explores the relationship between attributes, is the promising technique for HD prediction. HD symptoms can be effectively learned by the computer to classify HD into different classes. However, the informationprovided may include redundant and interrelated symptoms. The use of such information may degrade the classification performance. Feature selection is an effective way to remove such noisy information
meanwhile improving the learning accuracy and facilitating a better understanding for learning model. In our method, HD attributes are re-selected based on their rank and weights assigned by Infinite Latent
Feature Selection (ILFS) method. Support Vector Machine (SVM) algorithm is applied to classify a subset of the selected attributes into different HD classes. SMOTE (Synthetic Minority Over-sampling Technique) data over-sampling technique is adopted to generate more amounts and varieties of data. The experiment is performed on the UCI Machine Learning Repository Heart Disease public dataset. Experimental results demonstrated that by only using a subset of selected 24 attributes over a total of 46 attributes, our method achieved an accuracy of 97.87% for distinguishing ‘no presence’ HD with ‘presence’ HD and an accuracy of 93.92% for distinguishing 5 different classes of HD.
Metrics
References
Shao, Y.E., Hou, C.D. and Chiu, C.C, “Hybrid intelligent modeling schemes for heart disease classification”, Applied Soft Computing, 14, pp.47-52, 2014.
Canlas, R. D. "Data mining in healthcare: Current applications and issues.", School of Information Systems & Management, Carnegie Mellon University, Australia, 2009.
Helma, Christoph, Eva Gottmann, and Stefan Kramer. "Knowledge discovery and data mining in toxicology.", Statistical methods in medical research, 9.4: 329-358, 2000.
Lee, I-N., S-C. Liao, and M. Embrechts. "Data mining techniques applied to medical information.", Medical informatics and the Internet in medicine, 25.2, 81-102, 2000.
Deepika, N., and K. Chandrashekar. "Association rule for classification of Heart Attack Patients.", International Journal of Advanced Engineering Science and Technologies, 11.2, 253-57, 2011.
K. Srinivas, B. Kavitha Rani and Dr. A. Govrdhan, “Application of Data Mining Techniques in Healthcare and Prediction of Heart Attacks”, International Journal on Computer Science and Engineering, Vol. 02, No. 02, pp.250 - 255, 2011.
A. Sudha, P. Gayathiri and N. Jaisankar, “Effective Analysis and Predictive Model of Stroke Disease using Classification Methods”, International Journal of Computer Applications, Vol. 43, No. 14, pp. 0975 – 8887, 2012.
Jabbar, M. A., Priti Chandra, and B. L. Deekshatulu. "Cluster based association rule mining for heart attack prediction.", Journal of Theoretical and Applied Information Technology, 32.2, 196-201, 2011.
Shouman, Mai, Tim Turner, and Rob Stocker, "Integrating decision tree and k-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients.", Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2012.
Dangare, Chaitrali S., and Sulabha S. Apte. "Improved study of heart disease prediction system using data mining classification techniques." International Journal of Computer Applications, 47.10, 44-48, 2012.
K. Usha Rani, “Analysis of heart diseases dataset using neural network approach”, International Journal of Data Mining and Knowledge Management Processive. 1, No. 5, pp. 1 - 8, 2011.
Olatubosun Olabode and Bola Titilayo Olabode, “Cerebrovascular Accident Attack Classification Using Multilayer Feed Forward Artificial Neural Network with Back Propagation Error”, Journal of Computer Science, Vol. 8, No. 1, pp.18 - 25, 2012.
M. Anbarasi, E. Anupriya and N.CH.S.N. Iyenga, “Enhanced Prediction of Heart Disease with Feature Subset Selection using Genetic Algorithm”, International Journal of Engineering Science and Technology, Vol. 2, No. 10, pp. 5370 - 5376, 2010.
Roffo, Giorgio, et al. "Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach.", arXiv preprint arXiv:1707.07538, 2017.
Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique" Journal of artificial intelligence research, 16, pp. 321-357, 2002.
http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names, The contents of the heart-disease directory
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3):389–422, 2002.
Fehr D, Veeraraghavan H, Wibmer A, Gondo T, Matsumoto K, Vargas HA, Sala E, Hricak H, Deasy JO, “Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images”, Proceedings of the National Academy of Sciences, 112(46), E6265-73, 2015.
T. Hofmann, “Probabilistic latent semantic analysis”, Proceedings of the Fifteenth conference on uncertainty in artificial intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc., 1999.
G. Roffo, S. Melzi, and M. Cristani, “Infinite feature selection”, In Conf. IEEE International Conference on Computer Vision, pp. 4202–4210, 2015.
https://www.mathworks.com/matlabcentral/fileexchange/56937-feature-selection-library, The MATLAB Feature Selection Library.
Downloads
Published
How to Cite
Issue
Section
License
1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.