An effective algorithm for computing reducts  in decision tables

Do Si  Truong; Lam Thanh Hien; Nguyen  Thanh Tung

doi:10.15625/1813-9663/38/3/17450

Author affiliations

Authors

Do Si Truong Lac Hong University, 10 Huynh Van Nghe Street, Buu Long Ward, Bien Hoa City, Dong Nai Province, Viet Nam
Lam Thanh Hien Lac Hong University, 10 Huynh Van Nghe Street, Buu Long Ward, Bien Hoa City, Dong Nai Province, Viet Nam
Nguyen Thanh Tung Lac Hong University, 10 Huynh Van Nghe Street, Buu Long Ward, Bien Hoa City, Dong Nai Province, Viet Nam

DOI:

https://doi.org/10.15625/1813-9663/38/3/17450

Keywords:

Feature selection, attribute reduction, attribute clustering, partitioning around medoids clustering, normalized variation of information, rough set.

Abstract

Attribute reduction is one important part researched in rough set theory. A reduct from a decision table is a minimal subset of the conditional attributes which provide the same information for classification purposes as the entire set of available attributes. The classification task for the high dimensional decision table could be solved faster if a reduct, instead of the original whole set of attributes, is used. In this paper, we propose a reduct computing algorithm using attribute clustering. The proposed algorithm works in three main stages. In the first stage, irrelevant attributes are eliminated. In the second stage relevant attributes are divided into appropriately selected number of clusters by Partitioning Around Medoids (PAM) clustering method integrated with a special metric in attribute space which is the normalized variation of information. In the third stage, the representative attribute from each cluster is selected that is the most class-related. The selected attributes form the approximate reduct. The proposed algorithm is implemented and experimented. The experimental results show that the proposed algorithm is capable of computing approximate reduct with small size and high classification accuracy, when the number of clusters used to group the attributes is appropriately selected.

Metrics

PDF views

146

References

Alimoussa, M., Porebski (2021), A., Vandenbroucke, N., Thami, R. and El Fkihi, S., Clustering-based Sequential Feature Selection Approach for High Dimensional Data Classification. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 4: VISAPP, pp. 122-132, 2021. DOI: https://doi.org/10.5220/0010259501220132

Al-Radaideh Q. A., Sulaiman M. N., Selamat M. H. and Ibrahim H. (2005), “Approximate reduct computation by rough sets based attribute weighting”. IEEE International Conference on Granular Computing, Vol. 2, pp. 383-386, 2005. DOI: https://doi.org/10.1109/GRC.2005.1547317

Bello R. and Falcon R. (2017), Rough Sets in Machine Learning: A Review. Chapter in Studies in Computational Intelligence, 2017. DOI: https://doi.org/10.1007/978-3-319-54966-8_5

Chandrashekar, G. and Sahin, F. (2014), A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16 – 28. 40th-year commemorative issue, 2014. DOI: https://doi.org/10.1016/j.compeleceng.2013.11.024

Chormunge S., Jena S. (2018), Correlation based feature selection with clustering for high dimensional data. Journal of Electrical Systems and Information Technology, 5 (2018) 542–549. DOI: https://doi.org/10.1016/j.jesit.2017.06.004

Dua, D. and Graff, C. (2019), UCI Machine Learning Repositories. http:// http://archive.ics.uci.edu/ml/

Gao K., Liu M., Chen K., Zhou N. and Chen J. (2005), “Sampling-based tasks scheduling in dynamic grid environment”. The 5th WSEAS International Conference on Simulation, Modeling and Optimization, pp. 25-30, 2005.

Goldberg, D. E. (1989). Genetic algorithms in search, optimization & machine learning. Boston, MA, USA: Addison Wesley.

Guyon E. and Elisseeff A. (2003), “An Introduction to Variable and Feature Selection”, J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.

Harris, D. and Niekerk, A. V. (2018), Feature clustering and ranking for selecting stable features from high dimensional remotely sensed data. International Journal of Remote Sensing, 39(23): 8934–8949, 2018. DOI: https://doi.org/10.1080/01431161.2018.1500730

Hong T. P., Liou Y. L. (2007), Attribute clustering in high dimensional feature spaces, in: International Conference on Machine Learning and Cybernetics, pp. 19–22, 2007. DOI: https://doi.org/10.1109/ICMLC.2007.4370526

Hong T. P., Wang P. C., Lee Y. C. (2009), An effective attribute clustering approach for feature selection and replacement. Cybernetics and Systems: An International Journal, 40: 657–669, 2009. DOI: https://doi.org/10.1080/01969720903294585

Hong T. P., Wang P. C., C.K. Ting (2010), An evolutionary attribute clustering and selection method based on feature similarity, in: The IEEE Congress on Evolutionary Computation, 2010. DOI: https://doi.org/10.1109/CEC.2010.5585918

Hong T. P., Liou Y. L., Wang S. L., Bay Vo (2014), Feature selection and replacement by clustering attributes. Vietnam J Comput Sci,1, pp. 47–55, 2014. DOI: https://doi.org/10.1007/s40595-013-0004-3

Hong T. P., Chen C. H., Lin F. S. (2015), Using group genetic algorithm to improve performance of attribute clustering, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.01.001 DOI: https://doi.org/10.1016/j.asoc.2015.01.001

Jakulin A. (2005), Machine Learning Based on Attribute Interactions. PhD Dissertation, 2005.

Janusz A. and Slezak D. (2012), Utilization of Attribute Clustering Methods for Scalable Computation of Reducts from High-Dimensional Data. Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 295–302, 2012.

Janusz A. and Slezak D. (2014), Rough set methods for attribute clustering and selection. Applied Artificial Intelligence, 28:220–242, 2014. DOI: https://doi.org/10.1080/08839514.2014.883902

Jensen R., Shen Q. (2001), A rough set-aided system for sorting WWW bookmarks, in: N. Zhong et al. (Eds.), Web Intelligence: Research and Development, pp. 95–105, 2001. DOI: https://doi.org/10.1007/3-540-45490-X_10

Kaufman, L., Rousseeuw, P.J. (1990): Computing groups in data: an introduction to cluster analysis. John Wiley & Sons, Toronto (1990). DOI: https://doi.org/10.1002/9780470316801

Kira K. and Rendell L.A. (1992), The feature selection problem: Traditional methods and a new algorithm, In Proceedings of Nineth National Conference on Artificial Intelligence, pp. 129-134, 1992.

Kohavi R. and John G. H. (1997), Wrappers for feature subset selection, Artif. Intell., 97(1-2), pp 273-324, 1997. DOI: https://doi.org/10.1016/S0004-3702(97)00043-X

Liu H. and Motoda H. (2007), Computational methods of feature selection, Chapman and Hall/CRC Press, 2007. DOI: https://doi.org/10.1201/9781584888796

Molina L.C., Belanche L. and Nebot A. (2002), Feature selection algorithms: A survey and experimental evaluation, in Proc. IEEE Int. Conf. Data Mining, pp. 306-313, 2002. DOI: https://doi.org/10.1109/ICDM.2002.1183917

Pacheco F., Cerrada M., Sánchez R. V., Cabrera D., Li C. (2017), José Valente de Oliveira (2017). Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Systems With Applications, 71 (2017), 69–86. DOI: https://doi.org/10.1016/j.eswa.2016.11.024

Z. Pawlak, Rough Sets (1991): Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, 1991. DOI: https://doi.org/10.1007/978-94-011-3534-4

Peter J. Rousseeuw (1987), Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20 (1987), 53-65. DOI: https://doi.org/10.1016/0377-0427(87)90125-7

Skowron, A., Rauszer, C. (1992): The discernibility matrices and functions in information systems, Handbook of Application and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht (1992), pp. 331–362. DOI: https://doi.org/10.1007/978-94-015-7975-9_21

Song, Q., Ni, J., and Wang, G (2013). A fast clustering based feature subset selection algorithm for highdimensional data. IEEE Transactions on Knowledge and Data Engineering, 25(1):1–14. 2013. DOI: https://doi.org/10.1109/TKDE.2011.181

Sun H. Q., Xiong Z. (2003), “Computing minimal reducts from incomplete information systems”. The Second International Conference on Machine Learning and Cybernetics, Vol. 1, pp. 350-354, 2003.

Wang G.Y., Yu H., Yang D. (2002), Decision table reduction based on conditional information entropy, Chinese Journal of Computers, 25 (7), pp. 759–766, 2002.

Zhao Y. (2012), R and Data Mining: Examples and Case Studies. Published by Elsevier in December 2012. https://www.webpages.uidaho.edu/~stevel/517/RDataMining-book.pdff

Wroblewski, J. (1995): Computing minimal reducts using genetic algorithms. In: The Second Annual Join Conference on Information Sciences, pp. 186–189, 1995.

Zhang Q., Xie Q., Wang G. (2016), A survey on rough set theory and its applications. CAAI Transactions on Intelligence Technology, 1, pp. 323-333, 2016. DOI: https://doi.org/10.1016/j.trit.2016.11.001

Zhu, K. and Yang, J. (2013), A cluster-based sequential feature selection algorithm. In 2013 Ninth International Conference on Natural Computation (ICNC), pp. 848–852, 2013. DOI: https://doi.org/10.1109/ICNC.2013.6818094

Zhu, X., Wang, Y., Li, Y., Tan, Y., Wang, G., and Song, Q. (2019), A new unsupervised feature selection algorithm using similarity-based feature clustering. Computational Intelligence, 35(1), 2–22, 2019. DOI: https://doi.org/10.1111/coin.12192