Estimating the Probability of Loan Default in Melli Bank: A Comparative Study of Machine Learning and Econometric Approaches

Taleblou, Reza; Kamali, Mir Ali; Mohajeri, Parisa

doi:10.22054/ijer.2025.84878.1350

Document Type : Research Paper

Authors

¹ Associate Professor of Economics, Allameh Tabataba’i University, Tehran, Iran

² Ph.D. Candidate in Economics, Semnan University, Semnan, Iran

https://doi.org/10.22054/ijer.2025.84878.1350

Abstract

The current study employed a comparative analytical framework to examine credit-default prediction. It relied on a comprehensive dataset of 56,965 loan contracts issued between 2019 and 2024 across the northern branches of Bank Melli Iran. Three modeling approaches were evaluated: traditional logistic regression and two ensemble machine learning methods—random forest (RF) and extreme gradient boosting (XGBoost). The analysis incorporated 29 predictive features categorized into three conceptual groups: loan contract characteristics (e.g., principal amount, repayment tenure, collateral type), borrower attributes (e.g., age, occupational profile, credit history), and institutional factors (e.g., branch location, branch type). Data preprocessing included outlier removal, text categorization, and the extraction of variables such as age and grace period. The models were evaluated under both baseline and optimized (hyperparameter-tuned) settings. The results showed that the machine learning models substantially outperformed the conventional logistic regression model. XGBoost delivered the highest discriminatory power (ROC-AUC = 99.73%), followed closely by RF (99.68%), whereas logistic regression lagged significantly (75.34%). On average, the AUC difference between the machine learning models and logistic regression was approximately 0.243, and statistical tests with 95% confidence intervals confirmed the significance of this gap. Overall, the findings provided strong evidence for the superior reliability of machine learning approaches in forecasting loan default.

Introduction

Although traditional econometric models such as logistic regression have long served as the foundation of credit scoring systems, their reliance on linearity assumptions and error independence limits their ability to capture the complex, nonlinear patterns typical of financial data. These limitations are further compounded by sensitivity to multicollinearity and distributional assumptions that are frequently inconsistent with real-world conditions. The present research aimed to address these shortcomings by conducting a rigorous comparative analysis of predictive methodologies within Iran’s banking sector—a context in which machine learning applications remain relatively underutilized despite widespread global adoption of artificial intelligence in finance. Specifically, the study intended to compare the performance of two ensemble learning techniques ( i.e., random forest and extreme gradient boosting or XGBoost), with that of conventional logistic regression in forecasting loan defaults using extensive real-world data from Bank Melli Iran. The methodological advantages of machine learning approaches arise from their ability to model complex nonlinear relationships without requiring predefined functional forms, to automatically capture variable interactions through hierarchical partitioning, to maintain robustness in the presence of outliers and non-normal distributions, and to detect subtle patterns in high-dimensional data that escape parametric detection. By systematically evaluating these capabilities, the current study tried to offer empirical evidence to support financial institutions in adopting more advanced and reliable risk modeling frameworks.

Materials and Methods

The selection of predictive models in this study is informed by theoretical foundations, empirical literature, and practical forecasting capabilities. Three distinct modeling approaches—random forest (RF), extreme gradient boosting (XGBoost), and logistic regression (LR)—were employed to evaluate their effectiveness in predicting loan defaults. As a widely used ensemble learning algorithm, random forest (RF) builds multiple decision trees using bootstrap aggregating and random subsets of observations and features. Each tree is trained independently, and final predictions are obtained through majority voting (classification) or averaging (regression). This structure reduces overfitting and improves generalization compared to single decision trees. XGBoost is an advanced gradient boosting algorithm known for its efficiency and high predictive accuracy. XGBoost constructs trees sequentially, with each new tree reducing the residual errors of the ensemble through gradient descent optimization. Rooted in the logistic function and formalized in modern choice modeling, logistic regression improves on linear probability models by mapping predictions to the [0,1] interval via a sigmoid transformation. Although valued for its interpretability, conventional econometric models such as logistic regression suffer from a series limitations, including linearity assumptions, limited interaction detection, multicollinearity sensitivity, and distributional constraints. These methodological constraints potentially compromise predictive performance in complex, non-linear domains such as credit risk assessment.

Results and Discussion

The machine learning models were evaluated under two configurations: a baseline setting using default parameters and an optimized setting using hyperparameter tuning. Hyperparameters—settings external to the model that are not learned from data—strongly influence predictive accuracy, computational efficiency, and generalization. Suboptimal hyperparameter selection can lead to underfitting or overfitting, thereby compromising model performance. Common optimization strategies include grid search, random search, and Bayesian optimization. Empirical evidence shows that random search is often more efficient in high-dimensional spaces (Bergstra & Bengio, 2012). Although default parameters may yield reasonable baseline performance, they rarely yield optimal performance (Probst et al., 2019). Prior research suggests that systematic tuning can increase accuracy by 10–20% (Hutter et al., 2019) and improve generalization (Liao et al., 2018). In this study, hyperparameters were optimized to maximize the area under the curve (AUC), a standard practice in credit risk modeling (Feurer et al., 2015). This approach can reduce prediction errors and enhance model stability in ensemble methods. The empirical results revealed substantial performance improvements through hyperparameter optimization. For the RF model, accuracy increased from 96% in the untuned configuration to 99% after tuning, with a notable reduction in false negatives and improved precision, albeit with a slight decline in recall for the default class. The optimized XGBoost model—using 375 trees, a maximum depth of 12, and a learning rate of 0.03—achieved the lowest false-negative and false-positive rates, offering an optimal balance between learning capacity and predictive accuracy. In contrast, logistic regression showed limited discriminatory power, with a recall of 0.16 and a ROC-AUC of 0.75, indicating inherent limitations in capturing the complex patterns associated with default events.

Random Forest Model (With Hyperparameter Tuning)

Random Forest Model (Without Hyperparameter Tuning)

(XGBoost) Model (With Hyperparameter Tuning)

(XGBoost) Model (Without Hyperparameter Tuning)

Logistic Regression Model

Source: Research Results
Summary of Model Results

Model

State

ACCURACY

Precision (Bad)

Precision (Good)

Recall (Bad)

Recall (Good)

F1-Score (Bad)

F1-Score (Good)

ROC-AUC

RF

Unoptimized

97%

94/0

98/0

83/0

99/0

88/0

98/0

935/0

RF

Optimized

99%

97/0

99/0

94/0

99/0

95/0

99/0

9968/0

XGBOOST

Unoptimized

98%

96/0

99/0

85/0

99/0

90/0

99/0

9966/0

XGBOOST

Optimized

99%

97/0

99/0

88/0

99/0

92/0

99/0

9973/0

LR

-

96%

90/0

96/0

16/0

98/0

27/0

98/0

7534/0

Source: Research Results

Conclusion

The empirical results of this study demonstrates the superior predictive capabilities of machine learning methods—particularly XGBoost)—compared with conventional econometric approaches for estimating the probability of default (PD) in Bank Melli Iran’s loan portfolio. This performance gap primarily arises from machine learning algorithms’ ability to capture nonlinear relationships and latent structural patterns among default determinants—features that linear parametric models are unable to detect. Model precision was evaluated using several metrics, including confusion matrix analysis, total accuracy, and area under the ROC Curve (AUC). The findings indicated that machine learning models deliver substantially higher predictive precision and improved default detection rates. The optimized XGBoost model achieved outstanding performance (accuracy = 99%, AUC = 0.9973), far surpassing the logistic regression model’s ability to identify default cases (recall = 0.16). This distinct performance disparity strongly supports the research hypothesis regarding the comparative advantage of machine learning in PD estimation. Despite their superior predictive performance, the operational deployment of advanced machine learning techniques in financial institutions remains constrained by two key challenges: the computational complexity of hyperparameter optimization and the interpretability limitations inherent in black-box models. These limitations highlight the practical importance of developing hybrid frameworks that integrate the interpretive transparency of traditional methods with the predictive power of machine learning approaches. This research provided evidence of a paradigm shift in credit risk analytics, moving away from the long-standing reliance on conventional statistical models (such as logistic regression and linear probability models) toward machine learning methodologies. While prior studies using traditional techniques achieved moderate success, their limitations in handling imbalanced distributions and complex interaction effects have become increasingly apparent. The present findings align with international research trends and offer novel empirical evidence from Iran’s banking sector—demonstrating that well-tuned machine learning algorithms can achieve unprecedented levels of accuracy (99% accuracy compared with a 16% default identification rate for logistic regression).

Keywords

Main Subjects

Financial Economics

References

Akerlof, G.A. (1970). The market for “lemons”: quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488–500. https://doi.org/10.2307/1879431

Aldrich, J.H. & Nelson, F.D. (1984). Linear probability, logit, and probit models (Quantitative Applications in the Social Sciences No. 07-045). SAGE Publications. https://doi.org/10.4135/9781412984744

Akinjole, A., Shobayo, O., Popoola, J., Okoyeigbo, O. & Ogunleye, B. (2024). Ensemble-based machine learning algorithm for loan default risk prediction. Mathematics, 12(21), 3423.

https://doi.org/10.3390/math12213423

Arrow, K.J. (1963). Uncertainty and the welfare economics of medical care. The American Economic Review, 53(5), 941–973. https://doi.org/10.1016/B978-0-12-214850-7.50028-0

Bergstra, J. & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281-305. https://doi.org/ 10.5555/2503308.2188395

Bermudez, J.D., Gonzalez-Rivera, G. & Gonzalez, M. (2022). Machine learning approaches to credit risk modeling: A comparative analysis. Journal of Risk and Financial Management, 15(4), 123. https://doi.org/10.3390/jrfm15040123

Berrar, D. (2019). Cross-validation. In Encyclopedia of bioinformatics and computational biology (pp. 542–545). Elsevier.

https://doi.org/10.1016/B978-0-12-809633-8.20349-X

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655

Breiman, L. (2011). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324

Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785

Chinchor, N. (1992). MUC-4 evaluation metrics. In Proceedings of the 4th Conference on Message Understanding (pp. 22–29). https://doi.org/10.3115/1072064.1072067

Efron, B. & Tibshirani, R.J. (1993). An introduction to the bootstrap. Chapman & Hall/CRC. https://doi.org/10.1007/978-1-4899-4541-9

Feurer, M. & Hutter, F. (2019). Hyperparameter optimization. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds) Automated Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-05318-5_1

Fishman, G.S. (1973). Statistical analysis for queueing simulations. Management Science, 20(3), 363–369.

https://doi.org/10.1287/mnsc.20.3.363

Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.

https://doi.org/10.1214/aos/1013203451

Green, D.M. & Swets, J.A. (1966). Signal detection theory and psychophysics. Wiley. https://doi.org/10.1086/405615

G’ulomova, B.M.M. qizi. (2023). Bank loan allocation model based on credit risk prediction of SMEs.

https://doi.org/10.1109/ictc57116.2023.10154753

Guo, C. (2016). Using machine learning techniques for credit risk modeling: Empirical evidence from China. Journal of Financial Risk Management, 5(3), 1–12. https://doi.org/10.4236/jfrm.2016.53005

Hand, D.J. & Henley, W.E. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523–541. https://doi.org/10.1111/j.1467-985X.1997.00078.x

Ho, T.K. (1995). Random decision forests. In Proceedings of the 3rd international conference on Document analysis and recognition (Vol. 1, pp. 278-282). IEEE https://doi.org/10.1109/ICDAR.1995.598994

Kelly, G.A. (1952). The psychology of personal constructs. Norton. https://doi.org/10.4324/9780203359037

King, M., Zhu, Q. & Wang, T. (2021). Combining behavioral and financial data to improve credit scoring models: Evidence from a commercial bank. Journal of Banking and Finance, 127, 106125. https://doi.org/10.1016/j.jbankfin.2021.106125

Liao, L., Li, H., Shang, W. & Ma, L. (2022). An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks. ACM Transactions on Software Engineering and Methodology, 31(3), 1–40. https://doi.org/10.1145/3506695

Liu, H. (2020). Credit risk assessment with ensemble learning: A study of small and medium enterprises. International Review of Financial Analysis, 71, 101519. https://doi.org/10.1016/j.irfa.2020.101519

Movahedinia, A. & Bahmai, N. (2015). Determining the default of legal entity customers' facilities using improved support vector machine least squares based on particle swarm optimization algorithm. International Conference on New Researches in Management, Economics, and Accounting. http://irdoi.ir/103-440-857-466 [In Persian].

Nuez Mora, J.A., Moncayo, P. & Franco, C. (2023). Loan default prediction: A complete revision of LendingClub. Estudios Gerenciales, 39(169), 1–17 https://doi.org/10.21919/remef.v18i3.886

Peykani, P., Sargolzaei, M., Sanadgol, N., Takalu, A. & Kamyabfar, H. (2023). Application of structural models (Merton and Geske) and machine learning models (random forest and gradient boosted trees) in predicting default risk of listed companies in the Iranian capital market. PLoS ONE, 18(11), e0292081.

https://doi.org/10.1371/journal.pone.0292081

Powers, D.M.W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1), 37–63 https://doi.org/10.9735/2229-3981

Probst, P., Boulesteix, A. & Bischl, B. (2018). Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Machine Learning Research, 20, 53:1-53:32. https://doi.org/10.48550/arXiv.1802.09596

Rahmani, A. & Esmaeili, G. (2010). The efficiency of neural networks, logistic regression, and discriminant analysis in predicting default. Quantitative Economics (Economic Studies), 7(4), 151-172. https://doi.org/10.22055/jqe.2010.10640 [In Persian].

Robinson, N. & Sindhwani, N. (2024). Loan default prediction using machine learning. In 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) (pp. 1–5). IEEE. https://doi.org/10.55041/IJSREM24519

Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 355–374. https://doi.org/10.2307/1882010

Stiglitz, J.E. & Weiss, A. (1981). Credit rationing in markets with imperfect information. The American Economic Review, 71(3), 393–410. http://www.jstor.org/stable/1802787

Tang, Y., Liu, Y. & Huang, X. (2019). Detecting moral hazard in loan default using deep learning algorithms. Expert Systems with Applications, 130, 95–103. https://doi.org/10.1016/j.eswa.2019.04.003

Tavakoli, S. & Ashtab, E. (2023). Comparison of the efficiency of machine learning models and statistical models in predicting financial risk. Quarterly Journal of Financial Management Strategy, 11(1), 53-76. https://doi.org/10.22051/jfm.2023.35240.2512 [In Persian].

Uphade, D.B., Muley, A.A. & Chalwadi, S.V. (2024). Identification of most preferable machine learning technique for prediction of bank loan defaulters. Indian Journal of Science and Technology, 17(4), 343-351. https://doi.org/10.17485/IJST/v17i4.2978

van Rijsbergen, C.J. (1979). Information retrieval (2nd ed.). https://doi.org/10.1002/asi.4630300621

Iranian Journal of Economic Research

Estimating the Probability of Loan Default in Melli Bank: A Comparative Study of Machine Learning and Econometric Approaches

References

References

Volume 30, Issue 103 - Serial Number 103
July 2025
Pages 1-41

Estimating the Probability of Loan Default in Melli Bank: A Comparative Study of Machine Learning and Econometric Approaches

References

References

Volume 30, Issue 103 - Serial Number 103July 2025Pages 1-41

Volume 30, Issue 103 - Serial Number 103
July 2025
Pages 1-41