Click here for a more formatted version of these pages.
  PT flag  em Português

Guide: How to Analyse the ANN (ver.1.02)

Table of Content

1. Introduction

You will understand how to:
  • read the regression graph (predicted vs. observed values)
  • read the sensitivity graph (sensitivity of the ouput on the input)
This is demonstrated on the diagnosis of resistant bacteria in human.

2. Generation of the Data-Set Top

The data is totally artificial. However, we tried to base it on a model that should reflect somewhat the reality. Since one important feature discussed here is about the underlying modell to the data, we don't uncover too much here. The input data (the independent variables) categories are: The output data (the dependent variables) categories are:
(scaled between 0 - 1000) Things we have to reveal: The data-set consists of 873 cases.

Yes, there is more too say, No, we don't reveal it now, please read on...

3. Analysis of the ANN Top

Output analysis On the left upper field of the "ANN Analysis/Trained ANN Statistics" page you find statistical key values for the ANN presented as text, as shown in the figure on the left. A more complete report is available by clicking on the links "text format report".
Now, if you have more than one output, like in this example, all the information shown referes to the selected output (the first one is selected by default). The "Predicted vs. Observed Regression Graph" has the name of the selected output in the title.
If you request the text report, you have to force a reload in your browser to see the actual report and not the previous one from the previously selected output.

So here you got:

In order to get a new analysis, select the output for which you want the analysis for and then press "New Output Analysis".

4. Regression Analysis Top

Elements of the "Predicted vs. Observed Regression Graph". Let's look at three different "pictures": We start with the "poor distribution of data":
Predicted vs. Observed, bad distribution The gathering around 0 and around 1 is not necessarelly bad, if you wanted a "yes/no" answer. However, you should submit a data-set which has an almost equal number of values at 0 and at 1. Since this is not the case, the quality of the ANN is probably not OK. Although the regression is not far from the ideal line (dashed), you should consider making a new selection of your data and resubmit it.
 
Predicted vs. Observed, bad ANN This kind of graph is not suitable for frequency analysis. So, you should use this graph to decide that you have to make a frequency analysis. You might have enough values around zero, but you can't tell it by this graph.
The point is, there are too many values too far away from from the regression line.

Again, don't trust this ANN.
 
Predicted vs. Observed, good ANN Predicted vs. Observed, good ANN
 
This both graphs point to a good quality ANN.

  1. No visible difference between ideal and observed regression line.
  2. Aparently good distribution of the vaules over the range of interest.
  3. Small number of values far away from the ideal regression.

5. Sensitivity Analysis Top

Specific Sensitivity, bad ANN What strikes immediatly is that input number 3 (Last treatment duration) is THE factor which influences the output (Specific Resistance). The other inputs have little influence.
In the real world, you would stop here. But since we know how the input data was generated, we can tell more about the ANN. And since this is for educational purpose, this is the right thing to do.

The last three columns are random number, hence they had no influence on the output. So inputs that show a similar level of influence on the output, should be considered irrelevant.
So, either the formula use to generate the data is bad or the ANN or both. Well, taking the regression analysis into account, at least the ANN has to be considered bad. But, looking at a frequency analysis of the output data, there is also a problem here: not enought values around "1000".
 
UnSpecific Sensitivity, bad ANN For the unspecific resistance, the big picture is the same. Inputs 7 (Age) and 8 (Time since last hospitalization) are a little bit more above the noise. But not convincing.
 
In the real world, you would have selected the data to give a balanced distribution of the output values, with respect to the type of prediction you want.
In this case, this would be a distribution balanced between 0 and 1000, and not something like, 10% of the values between 0 and 100, 80% between 900 and 1000, or vice versa.

Now to ANNs with more power:
Specific Sensitivity, good ANN Please disregard the blue bars for now, we will explain them later on.
So inputs 1 - 4 and 6 are relevant for the output (now: specific resistance), the others are irrelevant.
Input 3 is dominant.
 
UnSpecific Sensitivity, good ANN Now we look at the unspecific resistance.
There are differences in the pattern. However, the dominance of input 3 is here too. But let's look at the differences.

The relevant inputs are now 2, 3, 6 to 8. 1 and 4 lost relevance, 7 and 8 gained relevance.
Looking at the noise, we can suspect that there is significant difference in the way specific and unspecific resistance is acquired.
 
Now to the different two bars:

Specific Sensitivity, good ANN The figure on the left tells us that input 1 has influence on the selected output (Specific Resistance) above average. The red bar is considerable higher than the blue one.
 
UnSpecific Sensitivity, good ANN Compared to the "Specific Resistance", input 1 has lost its influence and has to be considered irrelevant for the "unspecific resistance". So we are able to point out different key factors for the two types of resistances and this is helpful in creating a modell on what is important to avoid antibiotic resistances.