Document Type : Research Paper

Authors

1 Associate Professor of economics, Allameh Tabataba'i university

2 Master of Economics, Allameh Tabataba'i University

3 Department of Theoretical Economics,Faculty of Economics , Allameh Tabataba'i University

10.22054/ijer.2025.84878.1350

Abstract

This study employs a comparative analytical framework to examine credit default prediction using a comprehensive dataset of 56,965 loan contracts issued between 2019 and 2024 at Bank Melli Iran's northern branches. Three distinct modeling approaches were evaluated: traditional logistic regression, and two ensemble machine learning methods—Random Forest (RF) and Extreme Gradient Boosting (XGBoost). The analysis incorporates 29 predictive features categorized into three conceptual groups: loan contract characteristics (e.g., principal amount, repayment tenure, collateral type), borrower attributes (e.g., age, occupational profile, credit history), and institutional factors (e.g., branch location, branch type). The dataset underwent preprocessing procedures such as outlier removal, text categorization, extraction of variables like age and grace period. The models were evaluated under both baseline and optimized (hyperparameter-tuned) conditions. Results demonstrate machine learning models significantly outperformed conventional approaches, with XGBoost achieving superior discrimination (ROC-AUC = 99.73%) followed closely by RF (99.68%), while logistic regression trailed substantially (75.34%). The average AUC difference between the machine learning models and logistic regression was approximately 0.243, and statistical tests with 95% confidence intervals confirmed the significance of this gap. Overall, the findings confirm the reliable superiority of machine learning approaches in predicting loan default.

Keywords

Main Subjects