Credit Risk Assessment

Credit Risk assessment is a critical issue that Banks face nowadays. Which basically tells if a loan applicant can be a defaulter, so they can go ahead and grant the loan or not. This helps the banks to minimize the possible losses and can increase the volume of credit given. The output of the credit risk assessment usually will be the prediction of Probability of Default of an applicant.

The most accurate and mostly used credit scoring measure is the Probability of Default. Defaulter is the one who is unlikely to repay the loan amount or will have overdue of loan payment by more than certain number of days. Hence determining the PD is the crucial step for developing credit scoring model.

Data used for building a credit risk model is taken from various sources such as Banking, Financial, Business, Risk Rating, Customer demographic data and further analysis is done after combining parameters from the above datasets.

The dataset usually has many missing or imputed data points; which need to be replaced with valid data generated by making use of the available complete data. Various clustering algorithm can be used to perform multiple imputation. The numeric features are normalized before this step. The normalisation can be done in various ways such as on nature of business, sector etc. as different parameters will have different behaviour (distribution) according to the sector to which the company belongs.

We then check the correlation between the parameters if there are two parameters giving us the same information which may be redundant to us by using random forest which give ranking according to the importance of that variable with respect to target variable. After the features are fixed we build various models and cross validate by checking accuracy parameters like sensitivity, specificity, good loan reject percentage and so on.


We selected a suitable model after applying a few other models. Usually Random Forest or XG Boost algorithms works better in these scenarios where the data does not have any linear relationship or is unbalanced.


Leave a Reply

Your email address will not be published. Required fields are marked *