Click here for linear version of these pages.
  PT flag DE flag
Home
About Us
New Account
Access Your Account
News
Info & Links
ANN Documentation
Links
Data Security
User License Agreement
How to quote microCortex.com
Scientific Publications Related to the microCortex Algorithm
Why ANN?
To Get Started
WalkThrough Guide
Guide: Credit Risk Assessment
Guide: How to Use and Features
Guide: How to Analyse the ANN
Data Types
Very Quick Guide to Neural Networks
ANN: More Details
microCortex.com logo
Privacy Statement
Find at our Site
Impressum
Contact us

Walk Through Guide (ver.1.04)

 

Index

Index

1. Introduction

2. The data

2.1. The input data

2.2. The output data

3. Submitting the data to ANNOptimizer

4. Retrieving the ANN

5. Making predictions

6. Advanced Users

 

1. Introduction

This document is a step by step user manual that walks the reader through the process of developing an artificial neural network with the microCortex.com on-line application. The sample data used to generate it is in the links at the bottom of this page. The reader is invited to repeat these steps using the example to get a feeling for the procedure and also for the interpretation of the results obtained. For a quick introduction to ANN, please refer to the "Very Quick Guide to Neural Networks", or, for a more in-depth exposition, please refer to "Machine Learning, Neural and Statistical Classification", which can be accessed from the home page above.

 

2. The data

The data was generated by picking a clear signal and degrading it. If the ANN is in fact distinguishing signal from noise we should get the original signal back at the end of this walkthrough.

The ANN requires two sets of data, the Input, x, and the Output, y, data. A successful ANN will identify an association such that y=ANN(x). Therefore, you can think of x as the set of independent variables and y as the set of dependent variables in your ANN model.

 

2.1. The input data

The input data consists of three parameters, x1, x2 and x3:

x1 is an arithmetic progression (example of a clear input signal).

x2 is obtained by randomly degrading x1 (example of a somewhat unreliable input signal).

x3 is obtained by randomly degrading x2 even further (example of a barely relevant input parameter).

The values obtained are listed in Table 2.1.

 

2.2. The output data

The output data was generated as sigmoid functions of x1. Two sigmoid signals were generated, y1 and y2, of different frequencies and amplitudes.
Important note: The actual scale for either input or output variables is not relevant since all data is automatically normalized between 0 and 1 prior processing. The y1 and y2 sigmoid signals where also randomly degraded. The values obtained are listed in Table 2.2 and the Cartesian plot of (y1,y2)=f(x1) is represented in Figure 2.2.

Table 2.1 - Values for the three input parameters with progressive degrees of random degeneration (see text) Table 2.2 - Values for the two output sigmoid variables (see text)
Figure 2.1 - Cartesian plot of input parameters listed in Table 2.1. Figure 2.2 - Cartesian plot of output parameters, y1 and y2, listed in Table 2.2, as a function of x1. y1 and y2 are degrade sigmoid functions of x1.

 

3. Submitting the data

If you are an authorized user you can access the on-line tool through the Data Input link in navigation bar after you login and submit the x and y data to train a ANN to learn a generalisable association

The data in Tables 2.1 and 2.2. can now be pasted into the right boxes. The values in each row must be separated by spaces or tab characters. Therefore, pasting from a Tab-delimited text file or from an Excel worksheet is a straightforward copy/paste operation. The variable names can be any word or collection of words, disposed in rows. In order to separate rows you just need to press enter after each variable full name. Again pasting variable names from a Tab-delimited text file or from an spreadsheet (ex: Excel, Quatro Pro) (Excel worksheet) is a straightforward copy/paste operation, provided that they are presented as lines in a unique column in the source document.

and further below in the same page the entry window for outputs:

By pressing the Submit button, you get a text listing of the data submitted. If everything looks ok press "Submit" again

 

After this final operation you get a new screen confirming that the data was submitted and a ID number is assigned so you can retrieve it and use it later:

Job submitted.
Please wait until you receive an email indicating that your job is finished,and then go to the ANN List page
If you prefer you can leave this browser open and the click in this link
TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612

An e-mail was sent to TestUser@microcortex.com
ID NUMBER: TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612

 

You also get an email with the same message:

From: HTTP Server ID <http@microcortex.com>
To: TestUser@microcortex.com
Subject: ANN JOB TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612

Your Job has started.
Your ANN ID is: TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612

At this point all you have to do is wait for another email message to arrive saying that your ANN was successfully trained.

 

4. Retrieving the ANN

After a waiting period that depends on the complexity of the ANN being developed and the ammount of data submitted you get a message saying that the ANN is ready. In the present example:

From: HTTP Server ID <http@microcortex.com>
Message-Id: <200012181231.MAA10322@gate.microcortex.com>
To: TestUser @ microcortex.com
Subject: ANN JOB TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612

Your Job has finished sucessfully.
Your ANN ID is: TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612
977142476
Mon Dec 18 12:27:56 WET 2000
191.24user 0.64system 3:17.59elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (4030major+2642minor)pagefaults 0swaps
MD5 checksum of trained network: b302e7acd8292c4e81ea24d13a0d2ce0
977142674
Mon Dec 18 12:31:14 WET 2000

From now on the ANN can be retrieved, at any time, by going to the ANN List link after you login.

By clicking on the ANN ID, you will move to the Ann Analysis page

You can get the statistics for the other output variable simply by selecting the name for y2 and pressing "New output analysis". This can be done repeatedly and at any time later since the neural network is kept in a database with its unique, randomly assigned ID number.

The quality of the predictions can be inspected by looking at the predicted versus obtained values:

The training and testing data-points for the median neural network are distinguished in the plot. In order to avoid biasing the crossvalidation proceedure, the testing sub-set is repeatedly harvested and the median performer is selected. The best median performer (see "Very Quick Guide to ANN") decides what is the optimal number of hidden nodes, reported on the right box.

The closest the linear regression line(full) is to the identity line (dashed), the better the predictions. In addition, the proximity of the datapoints to the regressed line is quantified by the standard Pearson's correlation coefficient, r2. It is interesting to note that this is in fact a non-linear and non-monotonous correlation coefficient, a correlation measure not available in the conventional Statistics. The non-linear correlation coefficients are restricted to monotonous datasets. On the contrary the r2 reported for the ANN predictions is free from any restriction. Continuing the analogy with conventional linear regression, index should be a measure similar to the regression coefficients. This statistic cannot be applied directly to ANN due to the absence of a linear discriminant function. However, the concept can be extended by considering the average sensitivity of each output to each of the inputs:

We find that the ANN correctly identified the first input to be the most reliable as a basis for predictions, followed by the second and finally the third input variable was found to be almost neglectable. Important note: The ANN training procedure is such that all information available can be captured. The noisiness of an input variable by itself will not prevent the underlying signal from being used . The sensitivity analysis is as important as the regression analysis as it quantifies the importance of each input variable for the prediction. This is particularly important to simplify the number of variables necessary for monitoring and can be also used as the basis for mechanistic explanations for the association between input and output variables. Important note: Each output is predicted independently. In fact, a separate ANN is developed for each output, using all inputs at a time. Therefore, there is no need to separate different sets of dependent variables according to their interdependencies, which can be recovered by cluster or factor analysis of their sensitivities.

 

5. Making predictions

After pressing the Predictions button you are asked for " input questions ". The input questions are lines of data values, one column for each variable, in the same format used in the beginning for the input and output data. Apart from having as many input variables (columns) as the input data-set used to train the ANN there is no other restriction. As an example, the following 4 input sets can be submitted to request ANN predictions for the values of the corresponding outputs:

The corresponding 4 output predictions, one for each of the output variables, are generated bellow the submission box:

The confidence interval for the predictions can be roughly estimated by using the confidence intervals of the residuals in the predicted versus observed comparison (plot in the previous section):

which means that the lower p=0.025 boundary for the first output variable is the predicted value minus 16.1244, or added 21.3233 for the upper boundary of that variable.The results are also displayed with the confidence intervals (CI)taken in account.

It may happen that the input data being submitted is very different from anything used to develop the ANN. If that is the case, both the predictions, the sensitivities (below) and the confidence intervals should be used with caution as they may not be representative:

Indexes clearly higher than 1 indicate an input vector very different from anything used to train the ANN. Accordingly, the index highlights [100 200 300] as being novel, and as such its predictions should be mistrusted.

All information provided through graphic interfaces is also available through as a text file such that you can use your favorite graphics or statistical packages and proceed to a more advanced analysis and the production of publication-quality plots. In order to get the text formatted report look for the inconspicuous link somewhere on both the "Trained Artificial Neural Network Statistics" and the "Predicting with trained ANN pages"

6. Advanced users

Advanced users who want to get the algebraic expression for the ANN to incorporate in their own applications where also considered. The text report for the "Trained Artificial Neural Network Statistics" page includes one sub-section for each output detailing the values for the weights, biases, input scaling and output scaling.

This sub-section, there is one in each output section, looks something like:

i.e. the second input is linearly scaled (the values vary for each input) from [-1.95  52.31] to [0  1]. The values for the hidden layer, h, are obtained from an input vector x by the expression h=tanh(w1*x). The second hidden layer of weights, w2, uses a sigmoid transfer function and the unscaled outputs, yu, are obtained by yu=1/(1+exp(-w2*h)). The final scaled output, y, is obtained by linear transformation of the interval [0  1] into, for this example, [-35.73  42.61]. As indicated in the report, the values of the biases make the last column of the weight matrix. These values are activated in the computation by adding a element 1 to each input vector x.

 


Sample Data