Credit Risk Rating Using Supervised ML and Business Understanding

The risk rating models are designed to help assess the likelihood of default. Assessing risk at the loan level provides an opportunity to aggregate risk at the portfolio level and can help to quantify the risks based on the type of loan, geographic location or region, industry sector, or other variables including financials.

Parameter Selection

Generally each loan is evaluated under four risk parameters: Financial, Business, Promoter and Management Parameters. There are several parameters (approx. 500) which are related to previous and current financial years. The important parameter selection is done by using linear relationship between chi-square testing for independence. The linear relationship is calculated using linear correlation but some parameters are categorical hence chi square test is used. We have checked the independence between parameter and Good/Bad Loan status and final selection of parameters is selected using appropriate criteria.

Machine Learning Model

The relation between parameters and Bad/Good loan is not linear for all parameters, Also we did not find very strong linear relationship hence we have used Tree based algorithms (Random Forest). The number of observations are low as compare to number of parameters hence to improve accuracy of Tree based algorithm we have used random forest model to get ensemble model of 10 different trees.

Combining ML with Other Parameters

Final Model was the combined model of score based model and ML model with different weights assigned to different model. The scores and weights for different parameters are assigned using histogram cuts and as per business understanding. The assignment of weights to different model as per performance of different model. The weight and scores are finalized after simulation. The final scores are categorized into 10 levels as a Risk Rating Score.