For each experiment a log file with the name filename_seed.log is created. The log file contains the log of what run_exp has been doing. If run_exp is invoked several times for the same filestem and seed in the same output directory, each new log will be added to the end of any existing one, unless the option -o (overwrite) has been given to the run_exp command. The log will contain more information from the run_exp command if the -d (debug) option was given and will also include debuggin information from the interface scripts called if the option -lad was given.
The :results
file contains a group of variables that
describe the experiment and database, and another group of varibales
that contain information for each combination of algorithm, fold, and
repetition.
File
: The full path and filestem to the database processed.
Filestem
: The filestem without any path as used in the output
files. If a suffix is added to the output file names (e.g. when
a preprocessing algorithm is used), that suffix will be included here.
In other words, this is the part of the filestem that will be used
in the output files before the _<seed>
part.
InFilestem
: The filestem without any path as specified
for the -f
command line option of run_exp
. This will
never contain any suffixes.
ModelType
: Either classification
or regression
Start
: The start date and time of the experiment, in standard UNIX date format.
User
: The login name of the user running the experiment
Host
: The (short) hostname of the machine on which the
experiment was run.
OS
: The (short) name of the operating system.
System
: More detailled information about the operating
system, version and architecture.
CPUlimit
: The CPU time limit specified for this experiment; the
value 0 means no limit.
Seed
: The random seed used for the randomization of the
crossvalidation folds.
Version run_exp
: The program version of run_exp
Samplespec: n/n
: The values given (or the default values) for
the -samp and -hsamp options of run_exp
.
Preprocessing
: The name of the preprocessing algorithm, or empty
DBSize
: The number of records in the .data file of the
input database.
DBdataMD5
: The MD5 key of the .data file.
This can be used to check if exactly the same file has been used for
different experiments.
DBnameMD5
: The MD5 key of the .names file.
Type_data
=, N_continuous_attr
, N_discrete_attr
,
N_total_discrete_vals
, Avg_discrete_vals
, Log_discrete_combinations
, Avg_discrete_combinations
, N_classes
: These values
are the output of the parse_names program and are explained
in Section 5.7.1.
Learner
: For each learning algorithm there is one line with
this key, giving the name of the learning algorithm.
Learner_Parameters <learner>
: for each learner a line
giving all the parameters as specified on the run_exp
command line.
DCT_Totaltime
: The total CPU time measured for the DCT algorithm,
if it was run.
Evalmethod
: The evaluation method used - one of
xval, holdout, cstho, or loov.
Evalparms
: The parameters used for the method, separated
by commas. In addition, for each method, there is a special set of
keywords that individually give the values for the evaluation parameters,
e.g. for xval: XVAL_folds
and XVAL_repeat
.
DBSizeTrain <r> <f>
: The actual size of the training data
for repetition <r>
and fold <f>
.
DBSizeTest <r> <f>
: The actual test size per repetition/fold.
Error <r> <f> <alg>
: The holdout error (error of the
learned model on the test set) for algorithm <alg>
for that repeition/fold. This is the error as reported from the interface
script, not as measured by the run_stats
script from the target/predictions files.
Resubsterror <r> <f> <alg>
: The resubstitition error (error
of the model on the training set), if reported by the interface script.
Size <r> <f> <alg>
: The model size, ifand as
reported by the interface
script.
Testtime <r> <f> <alg>
: The time needed for the testing step
in CPU seconds, as reported by the interface script.
Traintime <r> <f> <alg>
: The time needed for the training step
in CPU seconds, as reported by the interface script.
Totaltime <r> <f> <alg>
: The time needed for both the training
and testing step,
in CPU seconds, as reported by the interface script. For some learning
algorithms it might not possible easily to get individual training and
testing times since they carry out both steps in one program run. For
this only the Totaltime
value will be different from the missing
value indicator.
Status <alg>
: The final status of the experiment for this
algorithm. This is estimated from the output of the interface scripts.
Either ok if everything worked well, timeout if
the CPU time limit was ???, unknown if the status could not
be determined, and nok if something went wrong (e.g. the algorithm
crashed).
Error <alg>
: The final average error as calculated
from the individuals error reported by the interface script.
Resubsterror <alg>
: The final average Resubstitution error.
Size <alg>
: The final average model size.
Testtime <alg>
: The final average testing time.
Traintime <alg>
: The final average training time.
Totaltime <alg>
: The final average total time.
Stop
: The date and time when the experiment was finished,
in standard UNIX date format.
The stats file contains all the measures that get calculated from
the .pred and .targets files by the run_stats
program (the run_stats
program gets called automatically
at the end of run_exp
unless explicitly supressed).
The variables in the .stats
file
for classification-type experiments:
Error <alg> <rep> <fold>
: The error of the model learned by
algorithm <alg>
from
the training set and evaluated on the testset
for repition <rep>
and fold <fold>
.
Error <alg>
: The error averaged over all classifications
from all folds and repitions. Note that this will be different from the
average of the fold/repitions Errors above, if fold sizes are not
the same for all folds.
StdDevOfError <alg>
: The standard deviation of the errors
for all folds/repitions.
StdErrOfError <alg>
: The standard error of the errors for
all folds/repitions (i.e. the standard deviation divided by the squareroot
of the number of errors)
Correct-Wrong <alg1> <alg2>
: The number of cases where
the classification was correct for algorithm <alg1>
and wrong
for algorithm <alg2>
.
Wrong-Correct <alg1> <alg2>
: The number of cases where
the classification was wrong for algorithm <alg1>
and correct
for algorithm <alg2>
.
pvalMcNemar <alg1> <alg2>
: The p-value of the McNemar
test for identical distributions of wrong/correct and correct/wrong
counts.
pvalPairedTTest <alg1> <alg2>
: The p-value of a
paired t-test for the errors.
p-val_McNemar <alg1> <alg2>
: OBSOLETE and only kept for
backward compatibility!
The variables in the .stats
file
for regression-type experiments:
ErrorSSE
: Sum of squared errors
ErrorMSE
: Mean squared error
ErrorRMSE
: Root mean squared error
ErrorNMSE
: Normalized mean squared error
ErrorMAD
: Mean absolute differences
ErrorNMAD
: Normalized mean absolute deviation
RSquare
: correlation coefficient between targets and predictions
p-MeanDiffZero <alg1> <alg2>
: p value for the test for equal means
The DCT program and its output are documented in [DCT doc].
For each fold of the crossvalidation, a file containing only
the targets of the test file for this fold gets stored
in the results directory. The name of this file
is of the form <filestem>_<seed>_<fold>.targets
.
These files are necessary for the run_stats
program
to calculate error estimates and similar measures.
For each combination of learning algorithm and fold of the crossvalidation,
a file containing only
the predictions of this learning algorithm for the test file gets stored
in the results directory. The name of this file
is of the form <filestem>_<seed>_<fold>_<alg>.pred
.
These files are necessary for the run_stats
program
to calculate error estimates and similar measures.