A new approach based on integration of random subspace and C4.5 decision tree learning method for spatial prediction of shallow landslides
Keywords:Landslide, random subspace, C4.5, GIS, Quang Ninh
The research approaches a new machine learning ensemble which is a hybridization of Random subspace (RS) and C4.5, named RandSub-DT, for improving the performance of the landslide susceptibility model. This is based on the GIS database, including 170 landslide polygons and ten predisposing landslide factors, i.e., slope, aspect, curvature, TWI, land use, distance to road, distance to the river, soil type, distance to fault, and lithology. We carried out this study in the Halong and Cam Pha City areas which are important economic centers in the Quang Ninh province, Vietnam, where landslides seriously influence the daily life of the citizen causing economic damage. We then used a GIS database to construct and validate the proposed RandSub-DT model. The model performance was assessed using a confusion matrix and a set of statistical measures. The result showed that the RandSub-DT model with the classification accuracy of 90.34% in the training dataset and the prediction capability of 77.48% had a high performance for landslide prediction. This research proved that an ensemble of the C4.5 and RS provided a highly accurate estimate of landslide susceptibility in the research area.
Akgun A., et al., 2012. An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers & Geosciences 38(1), 23-34.
Ayalew L., H.J.G. Yamagishi, 2005. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains. Central Japan. Geomorphology, 65(1-2), 15-31.
Barandiaran, et al., 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 1-22.
Binh Do Le et al., 1968. The report of the industrial geological mapping in Hon Gai - Cam Pha area at 1:50.000 scale. General Department of Geology and Minerals of Viet Nam.
Brideau M.-A., M. Yan, D.J.G. Stead, 2009. The role of tectonic damage and brittle rock fracture in the development of large rock slope failures. Geomorphology, 103(1), 30-49.
Bui Tien D., N.-D. Hoang, 2017. A Bayesian framework based on a Gaussian mixture model and radial-basis-function Fisher discriminant analysis (BayGmmKda V1. 1) for spatial prediction of floods. Geoscientific Model Development, 10(9), 1-19.
Bui D.T., et al., 2017. A novel fuzzy K-nearest neighbor inference model with differential evolution for spatial prediction of rainfall-induced shallow landslides in a tropical hilly area using GIS. Landslides, 14, 1-17.
Bui D.T., et al., 2016. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides, 13, 361-378.
Bui D.T., et al., 2017. Spatial prediction of rainfall-induced landslides for the Lao Cai area (Vietnam) using a hybrid intelligent approach of least squares support vector machines inference model and artificial bee colony optimization. Landslides, 14, 447-458.
Bui D.T., et al., 2016. GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks. Environmental Earth Sciences, 75, 1101.
Cantor S.B., et al., 2000. Determining the area under the ROC curve for a binary diagnostic test. Medical Decision Making, 20, 468-470.
Costanzo D., et al., 2012. Factors selection in landslide susceptibility modelling on large scale following the gis matrix method: application to the river Beiro basin (Spain). Natural Hazards and Earth System Sciences, 12, 327-340.
Dang V.-H., et al., 2019. Enhancing the accuracy of rainfall-induced landslide prediction along mountain roads with a GIS-based random forest classifier. Bulletin of Engineering Geology and the Environment, 78, 2835-2849.
Erener A., et al., 2017. Analysis of training sample selection strategies for regression-based quantitative landslide susceptibility mapping methods. Computers & Geosciences, 104, 62-74.
Fushiki T.J.S., Computing, 2011. Estimation of prediction error by using K-fold cross-validation. Statistics and Computing, 21, 137-146.
Gheshlaghi H.A., et al., 2017. An integrated approach of analytical network process and fuzzy based spatial decision-making systems applied to landslide risk mapping. Journal of African Earth Sciences, 133, 15-24.
Goetz J., et al., 2015. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Computers & Geosciences 81, 1-11.
Ho T.K., 1995. Random decision forests. Proceedings of 3rd international conference on document analysis and recognition. Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE Publishers, 278-282.
Ho T.K., 1998. The random subspace method for constructing decision forests. IEEE transactions on pattern analysis machine intelligence, 20(8), 832-844.
Hoang N.-D., et al., 2016. A novel relevance vector machine classifier with cuckoo search optimization for spatial prediction of landslides. Journal of Computing in Civil Engineering, 30(5), 04016001.
Hong H., et al., 2017. A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the Wuning area (China). Environmental Earth Sciences, 76(19), 652.
Hue T., et al., 2004. Investigation and assessment of the types of geological hazard in the territory of Vietnam and recommendation of remedial measures. Phase II: A Study of the Northern Mountainous Province of Vietnam. Institute of Geological Sciences, Vietnam Academy of Science and Technology, Hanoi, 361pp (in Vietnamese).
Hung Le, et al., 1996. The report of the industrial geological mapping in Cam Pha, Quang Ninh at 1:50.000 scale. Hanoi, General Department of Geology and Minerals of Viet Nam.
Hung L.Q., et al., 2017. Landslide inventory mapping in the fourteen Northern provinces of Vietnam: achievements and difficulties. Workshop on World Landslide Forum. Springer Publishers, 501-510.
Kavzoglu T., E.K. Sahin, I.J.L. Colkesen, 2014. Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression. Landslides, 11(3), 425-439.
Lagomarsino D., et al., 2015. Quantitative comparison between two different methodologies to define rainfall thresholds for landslide forecasting. Natural Hazards and Earth System Sciences Discussions, 3(1), 891-917.
Lucà F., M. Conforti, G.J.G. Robustelli, 2011. Comparison of GIS-based gullying susceptibility mapping using bivariate and multivariate statistics: Northern Calabria, South Italy. Geomorphology, 134(3-4), 297-308.
Meng Q., et al., 2016. GIS-based landslide susceptibility mapping with logistic regression, analytical hierarchy process, and combined fuzzy and support vector machine methods: a case study from Wolong Giant Panda Natural Reserve. China. Bulletin of Engineering Geology and the Environment, 75, 923-944.
Micheletti N., et al., 2014. Machine learning feature selection methods for landslide susceptibility mapping. Mathematical Geosciences, 46, 33-57.
Ministry of Natural Resources and Enviroment, 2003. Mapping topology in Quang Ninh scale 1:50.000, Department of survey.
Nhu V.-H., et al., 2020. Effectiveness assessment of Keras based deep learning with different robust optimization algorithms for shallow landslide susceptibility mapping at tropical area. Catena, 188, 13. https://doi.org/10.1016/j.catena.2020.104458.
Nhu V.-H., et al., 2021. An approach based on socio-politically optimized neural computing network for predicting shallow landslide susceptibility at tropical areas. Environmental Earth Sciences, 80, 1-18. https://doi.org/https://doi.org/10.1007/s12665-021-09525-6.
Pham B. T., et al., 2019. A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling. Bulletin of Engineering Geology and the Environment, 78, 2865-2886.
Pham B.T., et al., 2018. Bagging based Support Vector Machines for spatial prediction of landslides. Environmental Earth Sciences, 77, 146.
Pham B.T., et al., 2017. Landslide hazard assessment using random subspace fuzzy rules-based classifier ensemble and probability analysis of rainfall data: a case study at Mu Cang Chai District, Yen Bai Province (Viet Nam). Journal of the Indian Society of Remote Sensing, 45, 673-683.
Pham B.T., et al., 2016. Rotation forest fuzzy rule-based classifier ensemble for spatial prediction of landslides using GIS. Natural Hazards 83, 97-127.
Pham B.T., et al., 2017. Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena, 149, 52-63.
Pham B.T., et al., 2018. Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology, 303, 256-270.
Pradhan B., et al., 2010. Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modeling. Environmental Modelling & Software, 25(6), 747-759.
Quinlan J.R., 2014. C4.5: programs for machine learning. Elsevier Publisher, 58-60.
Reichenbach P., et al., 2018. A review of statistically based landslide susceptibility models. Earth-Science Reviews, 180, 60-91.
Shirzadi A., et al., 2017. Rock fall susceptibility assessment along a mountainous road: an evaluation of bivariate statistic, analytical hierarchy process and frequency ratio. Environmental Earth Sciences, 76, 152.
Skurichina M., et al., 2002. Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications, 5, 121-135.
Tien Bui D., et al., 2016. Spatial prediction of rainfall-induced shallow landslides using hybrid integration approach of Least-Squares Support Vector Machines and differential evolution optimization: a case study in Central Vietnam. International Journal of Digital Earth, 9(11), 1077-1097.
Truong X., et al., 2018. Enhancing prediction performance of landslide susceptibility model using hybrid machine learning approach of bagging ensemble and logistic model tree. Applied Sciences, 8(7), 1046.
Van Den Eeckhaut M., et al., 2006. Prediction of landslide susceptibility using rare events logistic regression: a case-study in the Flemish Ardennes (Belgium). Geomorphology, 76, 392-410.
Witten I.H., et al., 2016. Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems), 4th Edition. Morgan Kaufmann Publishers, 654pp.
Yem N., et al., 2006. Assessment of landslides and debris flows at some prone mountainous areas Vietnam and recommendation of remedial measures. State-level independent project (KC-08-01BS). Geology Institute - Vietnam Academy of Science and Technology, 145pp.
Zeiller M., J. Murphy, 2010. Modeling Our World: The ESRI Guide to Geodatabase Concepts, Second Edition. ESRI Press, 308pp.