Guide: Credit Risk Assessment (ver.1.05) (Printer friendly)
Table of Content
1. Introduction
This is a step by step
guide that walks you through the process of developing an artificial neural
network (ANN) at microCortex.com. Here we show you a very simple way of how to
use the application in an area of renewed interest, the Credit Risk
Assessment.
This is about finding accurate predictors of individual risk in the credit
portfolios. The reader is invited to repeat these steps using the example on a
fictitious mortgage loan data set to get a feeling for the procedure and also
for the applicability of the microCortex computer environment. This data set
refers to mortgage loans to individuals.
You will understand how to:
- transfer the data from your spreadsheet to the data input page,
- interpret the quality of the ANN obtained,
- submit a new case for the ANN to answer (use the ANN to predict).
|
Just follow the steps and... enjoy it!
2. The data Top
Putting yourself into the place of a bank loaner you can easily understand that
one of the most important issues you would like to know in advance, while
lending money, is whether will your client pay back the loan or not. In other
words, you will want to predict your client's likelihood of repayment.
The data set used here refers to mortgage loans to individuals. The likelihood
of repayment is measured as a simple: "Yes, the client will pay back the loan"
or "No, the client will not pay back the loan".
ANN's work through a process of learning with examples from the past in order
to predict the future - the same is to say, learning a generalizable
association amongst data or simply training the ANN.
This means you have to
set a record of your clients' past behavior leading the ANN to learn which
client profiles tend to fail the repayment and which don't. This record must
have two main categories of data:
The input data (Fig.1) - the set of values for each criteria you think
can influence the loan repayment.
Example: customer's age, income, number of children. Note that you will also be
able to access the sensitivity analysis (how strongly each variable determines
the repayment).
The output data (Fig.2) - the set of records with the "answer" of each
time you loaned money: "Yes, the client payed back the loan" or "No, the client
didn't pay back the loan".
If you gather that data in a spreadsheet you get something like shown in the
figures:
Your Spreadsheet
Fig. 1 - Input Data in a spreadsheet
Fig. 2-Output Data in a spreadsheet
Note: The sample data used to generate it is in the links at the bottom of this page.
Please remember this sample data is totally fictitious. It's not our purpose to
give you here an accurate view of the risk management in banking. Real world
application of ANN's to Credit Risk Assessment can possibly understand the use
of other criteria and relationships amongst data different from the ones shown
here.
An example: "beeing divorced" is said here as having a strong influence on the
mortgage likelihood of repayment - although this can be true don't take it as a
cientific or even empirical basis to say this is what happens in the real
world.
One last call about two important issues in your data:
- Use a minimum of 150 observations (cases) to train the ANN. As ANN
's are trained with past observed data, the more observations you use to train
your ANN the more accurate and reliable it will be for predictions and
sensitivity analysis (the more you practice to ride a bicycle the better rider
you become!);
- The data you use to train the ANN should be as generalized as
possible - try not to use always the same kind of client's profile (Example: it
is better to train the ANN with clients randomly aged 20 through 90 instead of
having 50% of them around the 30's - the more different situations you practice
with your bicycle the expert you become).
3. Submitting the data Top
If you are an authorised user you can submit data to train your ANN through the
Data Input link in navigation bar after you log on.
Data can now be pasted into the right boxes.
First, the Inputs:
Fig. 3 - Pasting input data into browser
... then the Outputs:
Fig. 4 - Pasting output data into browser
Note:
Data is presented in separate spreadsheets for a better understanding of the
process, but obviously it can be placed in one spreadsheet only, as it usually
is.
While entering data pay attention to the following Submitting Rules
(check Figs. 3 and 4):
- input and output names are placed in rows - each name
(variable) in its own row;
- input and output data are placed in columns, one column for
each name, one row for each credit repayment case.
- For the input and output data values in each row must be separated by at
least one tab or space character. In this example we made a straightforward
copy/paste action from our spreadsheet, which automatically places data in the
right place - one tab separation between each column of values. You can do it
this way, pasting from any Tab-delimited text file or from an Excel worksheet,
or you can introduce data manually, as long as you separate values with at
least one tab or space between each column.
- The number of rows for the input and the output data must match, meaning in
this example that the number of cases observed for the "payed" must be
exactly the same as the used for the "age", "children" and "income".
By pressing the "Submit" button, you get a text listing of the data submitted
in the Data Confirmation page. If everything looks OK and the data is
introduced correctly, according to the rules described, you will be allowed to
proceed - pushing the "Next" button - and invited to introduce a name and a
short description for the ANN to be trained.
After this final operation you get a new screen (white box below) confirming
that the data was submitted and an ID number is assigned so you can retrieve it
and use it later:
|
Job submitted.
Please wait until you receive an email indicating that your job is finished, and then go to the ANN List page.
If you prefer you can leave this browser open and the click in this link
TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612
An e-mail was sent to TestUser@microcortex.com
ID NUMBER: TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612
|
You also get an email with a similar message.
4. Retrieving the ANN Top
After a waiting (training) period that depends on the complexity of the ANN
being developed and the amount of data submitted you get an email reporting
that your ANN is ready: "Your Job has finished successfully"
From now on the ANN can be retrieved, at any time, by going to the ANN
List link after you log on.
By clicking on the ANN ID, you will move to the ANN Analysis page, where
you can have a glance at the quality of the trained ANN:
Fig. 5 - Statistics of the trained ANN: Quality
|
|
The quality of the
predictions can be inspected by looking at the predicted versus
obtained values.
In this example, observed values can only be "0" or "1" - the
values represented by circles in the plot.
|
|
Fig. 6 - Statistics of the trained ANN: Sensitivity
|
|
Here you can evaluate which variables most influence
your output by considering the average sensitivity of the output
to each of the inputs.
In this example, we can see that "income", "number of children"
and "divorced" have a big influence on "payed", while
"relincharge" (relatives in charge) and "pets" have little or
no influence.
|
|
For those more familiar with statistical analysis the ANN Analysis
page gives a good impression on the statistical quality measure of the trained
ANN. If that is your case,
check the details on statistical results.
5. Making predictions Top
Fig. 7 - Predictions button
After pressing the "Predictions" button you are asked for " input questions".
As an example, the following 4 input sets can be submitted to request ANN
predictions for the output - "Will they pay or not?":
Fig. 8 - Predictions Input page
The input questions are lines of data values, one column for each variable, in
the same format used in the beginning for the input data. Once again you can
simply copy and paste values from a tab-delimited text file or any spreadsheet.
Apart from having as many input variables (columns) as the input data set used
to train the ANN there is no other restriction.
The corresponding 4 output predictions, are generated bellow the box for
submitting:
|
|
| |
Here you get the predictions to your inputs.
Values close to 1 can be assumed as 1, which means "yes".
|
|
|
Check the novelty of your question. Values clearly higher than 1
indicate values in your question very different from anything used
to train the ANN. Accordingly, the table highlights the 4th input
line as having a certain degree of novelty - no case line with
someone aged 90 (or near) was used to train this ANN.
|
|
| |
Fig. 9 - Statistical analysis of the prediction
If you feel comfortable about ANN terms, you may
check the section
for advanced users for more details on analysing the trained ANN
in the "Walk Through Guide".
6. Download Data Top
Data in one file (txt)
Data in one file (html)