Index
1. Introduction
2. The data
2.1. The input data
2.2. The output data
3. Submitting the data to ANNOptimizer
4. Retrieving the ANN
5. Making predictions
6. Advanced Users
1. Introduction
This document is a step by step user manual that walks the reader through
the process of developing an artificial neural network with the microCortex.com
online application. The sample data used to generate it is in the links at the
bottom of this page. The reader is invited to repeat
these steps using the example to get a feeling for the procedure and also for
the interpretation of the results obtained. For a quick introduction to ANN,
please refer to the "Very Quick Guide to Neural Networks", or, for a more
indepth exposition, please refer to "Machine Learning, Neural and Statistical
Classification", which can be accessed from the home page above.
2. The data
The data was generated by picking a clear signal and degrading it. If the
ANN is in fact distinguishing signal from noise we should get the original
signal back at the end of this walkthrough.
The ANN requires two sets of data, the Input, x, and the
Output, y, data. A successful ANN will identify an association
such that y=ANN(x). Therefore, you can think of x
as the set of independent variables and y as the set of dependent
variables in your ANN model.
2.1. The input data
The input data consists of three parameters, x1, x2 and x3:
x1 is an arithmetic progression (example of a clear input signal).
x2 is obtained by randomly degrading x1 (example of a somewhat unreliable input signal).
x3 is obtained by randomly degrading x2 even further (example of a barely relevant input parameter).
The values obtained are listed in Table 2.1.
2.2. The output data
The output data was generated as sigmoid functions of x1. Two sigmoid
signals were generated, y1 and y2, of different
frequencies and amplitudes.
Important note: The actual scale for
either input or output variables is not relevant since all data is
automatically normalized between 0 and 1 prior processing. The
y1 and y2 sigmoid signals where also randomly
degraded. The values obtained are listed in Table 2.2 and the Cartesian plot of
(y1,y2)=f(x1) is represented in Figure 2.2.
Table 2.1  Values for the three input parameters with progressive degrees of random degeneration (see text)

Table 2.2  Values for the two output sigmoid variables (see text)





Figure 2.1  Cartesian plot of input parameters listed in Table 2.1.

Figure 2.2  Cartesian plot of output parameters, y1 and
y2, listed in Table 2.2, as a function of x1.
y1 and y2 are degrade sigmoid functions of
x1. 
3. Submitting the data
If you are an authorized user you can access the online tool through the Data Input link in navigation bar after you login and submit the x and y data to train a ANN to learn a generalisable association
The data in Tables 2.1 and 2.2. can now be pasted into the right boxes. The values in each row must be separated by spaces or tab characters. Therefore, pasting from a Tabdelimited text file or from an Excel worksheet is a straightforward copy/paste operation. The variable names can be any word or collection of words, disposed in rows. In order to separate rows you just need to press enter after each variable full name. Again pasting variable names from a Tabdelimited text file or from an spreadsheet (ex: Excel, Quatro Pro) (Excel worksheet) is a straightforward copy/paste operation, provided that they are presented as lines in a unique column in the source document.
and further below in the same page the entry window for outputs:
By pressing the Submit button, you get a text listing of the data submitted. If everything looks ok press "Submit" again
After this final operation you get a new screen confirming that the data was submitted and a ID number is assigned so you can retrieve it and use it later:
Job submitted.
Please wait until you receive an email indicating that your job is finished,and then go to the ANN List page
If you prefer you can leave this browser open and the click in this link
TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612
An email was sent to TestUser@microcortex.com
ID NUMBER: TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612

You also get an email with the same message:
From: HTTP Server ID <http@microcortex.com>
To: TestUser@microcortex.com
Subject: ANN JOB TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612
Your Job has started.
Your ANN ID is: TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612

At this point all you have to do is wait for another email message to arrive saying that your ANN was successfully trained.
4. Retrieving the ANN
After a waiting period that depends on the complexity of the ANN being developed and the ammount of data submitted you get a message saying that the ANN is ready. In the present example:
From: HTTP Server ID <http@microcortex.com>
MessageId: <200012181231.MAA10322@gate.microcortex.com>
To: TestUser @ microcortex.com
Subject: ANN JOB TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612
Your Job has finished sucessfully.
Your ANN ID is: TestUser_977142474_7069_bee3133fe707c783f61bddcbe6c42612
977142476
Mon Dec 18 12:27:56 WET 2000
191.24user 0.64system 3:17.59elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (4030major+2642minor)pagefaults 0swaps
MD5 checksum of trained network: b302e7acd8292c4e81ea24d13a0d2ce0
977142674
Mon Dec 18 12:31:14 WET 2000

From now on the ANN can be retrieved, at any time, by going to the ANN List link after you login.
By clicking on the ANN ID, you will move to the Ann Analysis page
You can get the statistics for the other output variable simply by selecting the name for y2 and pressing "New output analysis". This can be done repeatedly and at any time later since the neural network is kept in a database with its unique, randomly assigned ID number.
The quality of the predictions can be inspected by looking at the predicted versus obtained values:
The training and testing datapoints for the median neural network are distinguished in the plot. In order to avoid biasing the crossvalidation proceedure, the testing subset is repeatedly harvested and the median performer is selected. The best median performer (see "Very Quick Guide to ANN") decides what is the optimal number of hidden nodes, reported on the right box.
The closest the linear regression line(full) is to the identity line (dashed), the better the predictions. In addition, the proximity of the datapoints to the regressed line is quantified by the standard Pearson's correlation coefficient, r2. It is interesting to note that this is in fact a nonlinear and nonmonotonous correlation coefficient, a correlation measure not available in the conventional Statistics. The nonlinear correlation coefficients are restricted to monotonous datasets. On the contrary the r2 reported for the ANN predictions is free from any restriction. Continuing the analogy with conventional linear regression, index should be a measure similar to the regression coefficients. This statistic cannot be applied directly to ANN due to the absence of a linear discriminant function. However, the concept can be extended by considering the average sensitivity of each output to each of the inputs:
We find that the ANN correctly identified the first input to be the most reliable as a basis for predictions, followed by the second and finally the third input variable was found to be almost neglectable. Important note: The ANN training procedure is such that all information available can be captured. The noisiness of an input variable by itself will not prevent the underlying signal from being used . The sensitivity analysis is as important as the regression analysis as it quantifies the importance of each input variable for the prediction. This is particularly important to simplify the number of variables necessary for monitoring and can be also used as the basis for mechanistic explanations for the association between input and output variables. Important note: Each output is predicted independently. In fact, a separate ANN is developed for each output, using all inputs at a time. Therefore, there is no need to separate different sets of dependent variables according to their interdependencies, which can be recovered by cluster or factor analysis of their sensitivities.
5. Making predictions
After pressing the Predictions button you are asked for " input questions ".
The input questions are lines of data values, one column for each variable, in
the same format used in the beginning for the input and output data. Apart from
having as many input variables (columns) as the input dataset used to train
the ANN there is no other restriction. As an example, the following 4 input
sets can be submitted to request ANN predictions for the values of the
corresponding outputs:
The corresponding 4 output predictions, one for each of the output variables, are generated bellow the submission box:
The confidence interval for the predictions can be roughly estimated by using the confidence intervals of the residuals in the predicted versus observed comparison (plot in the previous section):
which means that the lower p=0.025 boundary for the first output variable is the predicted value minus 16.1244, or added 21.3233 for the upper boundary of that variable.The results are also displayed with the confidence intervals (CI)taken in account.
It may happen that the input data being submitted is very different from anything used to develop the ANN. If that is the case, both the predictions, the sensitivities (below) and the confidence intervals should be used with caution as they may not be representative:
Indexes clearly higher than 1 indicate an input vector very different from anything used to train the ANN. Accordingly, the index highlights [100 200 300] as being novel, and as such its predictions should be mistrusted.
All information provided through graphic interfaces is also available through as a text file such that you can use your favorite graphics or statistical packages and proceed to a more advanced analysis and the production of publicationquality plots. In order to get the text formatted report look for the inconspicuous link somewhere on both the "Trained Artificial Neural Network Statistics" and the "Predicting with trained ANN pages"
6. Advanced users
Advanced users who
want to get the algebraic expression for the ANN to incorporate in their own
applications where also considered. The text report for the "Trained
Artificial Neural Network Statistics" page includes one subsection for
each output detailing the values for the weights, biases, input scaling and
output scaling.
This subsection, there is one in each output section, looks something like:
i.e. the second
input is linearly scaled (the values vary for each input) from [1.95
52.31] to [0 1]. The values for the hidden layer, h, are obtained
from an input vector x by the expression h=tanh(w1*x).
The second hidden layer of weights, w2, uses a sigmoid transfer function
and the unscaled outputs, yu, are obtained by yu=1/(1+exp(w2*h)).
The final scaled output, y, is obtained by linear transformation
of the interval [0 1] into, for this example, [35.73 42.61]. As
indicated in the report, the values of the biases make the last column of the
weight matrix. These values are activated in the computation by adding a element
1 to each input vector x.
Sample Data