Optimizing XGBoost, CatBoost, and Bagging models for predicting the maximum dry density of compacted soil using grid search hyperparameter tuning

Binh Thai Pham, Indra Prakash
Author affiliations

Authors

  • Binh Thai Pham Geotechnical and Artificial Intelligence research group, University of Transport Technology, Hanoi 100000, Vietnam
  • Indra Prakash Formerly Dy. Director General, Geological Survey of India, Gujarat, India

DOI:

https://doi.org/10.15625/2615-9783/24022

Keywords:

Maximum dry density, soil compaction, ensemble learning. XGBoost, CatBoost, Bagging, grid search optimization

Abstract

In geotechnical engineering, an accurate estimation of maximum dry density (MDD) is essential to ensure the stability of geotechnical structures such as roads, embankments, and foundations. While traditional laboratory methods, such as the Proctor compaction test, are reliable, they are often labor-intensive and time-consuming. Therefore, the main aim of this study is to develop efficient data-driven models, including XGBoost (XGB), CatBoost (CAB), and Bagging (BAG), for rapid and reliable estimation of MDD using easily measurable soil properties. A dataset of 214 soil samples collected from the Van Don-Mong Cai expressway construction project (Vietnam) comprising eight key input variables was used: gravel content, coarse and fine sand contents, silt and clay content, optimum moisture content, liquid limit, plastic limit, and plasticity index. Model performance was evaluated using R², RMSE, MAE, and a Taylor diagram. Results indicate that the Grid Search-optimized BAG model achieved the best performance, with R² values of 0.94 and 0.81 for the training and test datasets, respectively, and the lowest RMSE and MAE. Optimized CAB showed comparable performance, while XGB exhibited relatively lower generalization capability. Optimized CAB yielded similar results, whereas optimized XGB performed worse. The significance of this study lies in demonstrating that ensemble learning models, particularly Bagging, can provide accurate, physically interpretable predictions of MDD, thereby reducing reliance on extensive laboratory testing. The novelty of this work lies in the systematic comparison of optimized ensemble models using a real construction dataset, combined with interpretability analysis via partial dependence plots consistent with established soil mechanics principles. These findings highlight the potential of optimized machine learning models as practical tools for modern geotechnical engineering applications.

Downloads

Download data is not yet available.

References

Ali H.F.H., Omer B., Mohammed A.S., Faraj R.H., 2024. Predicting the maximum dry density and optimum moisture content from soil index properties using efficient soft computing techniques. Neural Computing and Applications, 36(19), 11339–11369.

Ashwini S., Arunkumar J., Prabu R.T., Singh N.H., Singh N.P., 2024. Diagnosis and multi-classification of lung diseases in CXR images using optimized deep convolutional neural network. Soft Computing, 28(7), 6219–6233.

Ayalew A.A., Bitew G., Salau A.O., Tegegne M.L., Gupta S.K., Israr M., Tegegne T., 2024. Assegie TA Grid Search Hyperparameters Tuning with Supervised Machine Learning for Awngi Language Named Entity Recognition. In: 2024 Second International Conference Computational and Characterization Techniques in Engineering & Sciences (IC3TES). IEEE, 1–6.

Breiman L., 1996. Bagging predictors. Machine learning, 24, 123–140.

Cai Y., Yuan Y., Zhou A., 2024. Predictive slope stability early warning model based on CatBoost. Scientific Reports, 14(1), 25727.

Chen T., Guestrin C., 2016. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 785–794.

Demir Z., 2024. Effects of vermicompost and salinity on proctor optimum water content, maximum dry bulk density and consistency of a sandy clay loam soil. Communications in Soil Science and Plant Analysis, 55(12), 1747–1767.

Duc N.D., Nguyen M.D., Prakash I., Van H.N., Van Le H., Thai P.B., 2025. Prediction of safety factor for slope stability using machine learning models. Vietnam Journal of Earth Sciences, 47(2), 182–200.

Ewusi-Wilson R., Yendaw J.A., Sebbeh-Newton S., Ike E., Ayeh F.J.F., 2025. Explainable Artificial Intelligence Estimation of Maximum Dry Density in Soil Compaction Based on Basic Soil Properties and Compaction Energy. Transportation Infrastructure Geotechnology, 12(2), 1–30.

Friedman J.H., 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189–1232.

Günaydın O., 2009. Estimation of soil compaction parameters by using statistical analyses and artificial neural networks. Environmental Geology, 57(1), 203–215. Doi: 10.1007/s00254-008-1300-6.

Hancock J.T., Khoshgoftaar T.M., 2020. CatBoost for big data: an interdisciplinary review. Journal of big data, 7(1), 94.

Le T.-H., Nguyen H.-L., Pham C.-T., 2022. Artificial intelligence approach to predict the dynamic modulus of asphalt concrete mixtures. Journal of Science and Transport Technology, 22–30.

Li B., You Z., Ni K., Wang Y., 2024. Prediction of Soil Compaction parameters using machine learning models. Applied Sciences, 14(7), 2716.

Ngo T.Q., Nguyen L.Q., Tran V.Q., 2022b. Predicting tensile strength of cemented paste backfill with aid of second order polynomial regression. Journal of Science and Transport Technology, 43–51.

Ngo T.-T.T., Pham H.T., Acosta J., Derrible S., 2022a. Predicting bike-sharing demand using random forest, Journal of Science and Transport Technology, 2(2), 13–21

Nguyen D.D., Nguyen H.P., Vu D.Q., Prakash I., Pham B.T., 2023. Using GA-ANFIS machine learning model for forecasting the load bearing capacity of driven piles. Journal of Science and Transport Technology, 3(2), 26–33.

Nguyen D.D., Roussis P.C., Pham B.T., Ferentinou M., Mamou A., Vu D.Q., Bui Q.-A.T., Trong D.K., Asteris P.G., 2022. Bagging and multilayer perceptron hybrid intelligence models predicting the swelling potential of soil. Transportation Geotechnics, 36, 100797.

Nhat V.H., Trinh P.T., Cam L.V., Dieu B.T., Van H.L., Prakash I., Anh N.N., Van H.N., Thanh N.D., Thao N.P., 2025. Mapping Cadmium Contamination Potential in Surface Soil for Civil Engineering Applications: A Comparative Study of Machine Learning and Deep Learning Models in the Gianh River Basin, Vietnam. Journal of Science and Transport Technology, 48–70.

Omer B., 2025. Machine learning techniques and multivariable mathematical models for predicting modified soil compaction parameters based on particle size and consistency limits. Modeling Earth Systems and Environment, 11(1), 1–46.

Pal S., Hieu V.T., Nguyen D.D., Vu D.Q., Prakash I., 2024. Investigation of Support Vector Machines with Different Kernel Functions for Prediction of Compressive Strength of Concrete. Journal of Science and Transport Technology, 55–68.

Pham B.T., Amiri M., Nguyen M.D., Ngo T.Q., Nguyen K.T., Tran H.T., Vu H., Anh B.T.Q., Van L.H., Prakash I., 2021. Estimation of shear strength parameters of soil using Optimized Inference Intelligence System. Vietnam Journal of Earth Sciences, 43(2), 189–198.

Prakash I., Kumar R., Nguyen T.-A., Vu P.T., 2022. Development of effective XGB model to predict the Axial Load Capacity of circular CFST columns. Journal of Science and Transport Technology, 26–42.

Prakash I., Nguyen D.D., Tuan N.T., Van P.T., Van H.L., 2024. Landslide susceptibility zoning: integrating multiple Intelligent models with SHAP Analysis. Journal of Science and Transport Technology, 23–41.

Prakash I., Phan T.-N., 2023. Estimating the compressive strength of self-compacting concrete with fiber using an extreme gradient boosting model. Journal of Science and Transport Technology, 3(1), 12–25.

Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A., 2018. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31.

Qiu Y., Zhou J., Khandelwal M., Yang H., Yang P., Li C., 2022. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Engineering with Computers, 38(5), 4145–4162.

Taylor K.E., 2001. Summarizing multiple aspects of model performance in a single diagram. Journal of geophysical research: atmospheres, 106(D7), 7183–7192.

Thai P.B., Nguyen D.D., Thi Q.-A.B., Nguyen M.D., Vu T.T., Prakash I., 2022. Estimation of load-bearing capacity of bored piles using machine learning models. Vietnam Journal of Earth Sciences, 44(4), 470–480.

Tiwari L.B., Burman A., Samui P., 2024. A comparative study of soft computing paradigms for modelling soil compaction parameters. Transportation Infrastructure Geotechnology, 1–19.

Verma G., Kumar B., Ransinchung R.N.G., 2024. Particle swarm optimization-based machine learning algorithms for developing the modified proctor compaction parameter prediction software. Transportation Infrastructure Geotechnology, 11(4), 1492–1519.

Zhao L., Guan G.D., 2024. Maximum dry density estimation of stabilized soil via machine learning techniques in individual and hybrid approaches. Journal of Ambient Intelligence and Humanized Computing, 15(11), 3831–3846.

Zhao Q., Liu K., Xiong C., Deng X., Yang S., 2024a. Estimating the maximum dry density of soil via least square support vector regression individual and hybrid forms. Indian Geotechnical Journal, 1–13.

Zhao Y., Zhang W., Liu X., 2024b. Grid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting. Applied Soft Computing, 154, 111362.

Downloads

Published

29-12-2025

How to Cite

Thai Pham, B., & Prakash, I. (2025). Optimizing XGBoost, CatBoost, and Bagging models for predicting the maximum dry density of compacted soil using grid search hyperparameter tuning. Vietnam Journal of Earth Sciences. https://doi.org/10.15625/2615-9783/24022

Issue

Section

Articles

Most read articles by the same author(s)

1 2 > >> 

Similar Articles

<< < 2 3 4 5 6 7 8 9 10 11 > >> 

You may also start an advanced similarity search for this article.