Next: Solutions to frequent problems Up: METAL The METAL Machine Previous: Running Experiments Contents

Subsections

Structure and Organization of Output Data

The log file

For each experiment a log file with the name filename_seed.log is created. The log file contains the log of what run_exp has been doing. If run_exp is invoked several times for the same filestem and seed in the same output directory, each new log will be added to the end of any existing one, unless the option -o (overwrite) has been given to the run_exp command. The log will contain more information from the run_exp command if the -d (debug) option was given and will also include debuggin information from the interface scripts called if the option -lad was given.

The `results` file

The :results file contains a group of variables that describe the experiment and database, and another group of varibales that contain information for each combination of algorithm, fold, and repetition.

File: The full path and filestem to the database processed.
Filestem: The filestem without any path as used in the output files. If a suffix is added to the output file names (e.g. when a preprocessing algorithm is used), that suffix will be included here. In other words, this is the part of the filestem that will be used in the output files before the _<seed> part.
InFilestem: The filestem without any path as specified for the -f command line option of run_exp. This will never contain any suffixes.
ModelType: Either classification or regression
Start: The start date and time of the experiment, in standard UNIX date format.
User: The login name of the user running the experiment
Host: The (short) hostname of the machine on which the experiment was run.
OS: The (short) name of the operating system.
System: More detailled information about the operating system, version and architecture.
CPUlimit: The CPU time limit specified for this experiment; the value 0 means no limit.
Seed: The random seed used for the randomization of the crossvalidation folds.
Version run_exp: The program version of run_exp
Samplespec: n/n: The values given (or the default values) for the -samp and -hsamp options of run_exp.
Preprocessing: The name of the preprocessing algorithm, or empty
DBSize: The number of records in the .data file of the input database.
DBdataMD5: The MD5 key of the .data file. This can be used to check if exactly the same file has been used for different experiments.
DBnameMD5: The MD5 key of the .names file.
Type_data=, N_continuous_attr, N_discrete_attr, N_total_discrete_vals, Avg_discrete_vals, Log_discrete_combinations, Avg_discrete_combinations, N_classes: These values are the output of the parse_names program and are explained in Section 5.7.1.
Learner: For each learning algorithm there is one line with this key, giving the name of the learning algorithm.
Learner_Parameters <learner>: for each learner a line giving all the parameters as specified on the run_exp command line.
DCT_Totaltime: The total CPU time measured for the DCT algorithm, if it was run.
Evalmethod: The evaluation method used - one of xval, holdout, cstho, or loov.
Evalparms: The parameters used for the method, separated by commas. In addition, for each method, there is a special set of keywords that individually give the values for the evaluation parameters, e.g. for xval: XVAL_folds and XVAL_repeat.
DBSizeTrain <r> <f>: The actual size of the training data for repetition <r> and fold <f>.
DBSizeTest <r> <f>: The actual test size per repetition/fold.
Error <r> <f> <alg>: The holdout error (error of the learned model on the test set) for algorithm <alg> for that repeition/fold. This is the error as reported from the interface script, not as measured by the run_stats script from the target/predictions files.
Resubsterror <r> <f> <alg>: The resubstitition error (error of the model on the training set), if reported by the interface script.
Size <r> <f> <alg>: The model size, ifand as reported by the interface script.
Testtime <r> <f> <alg>: The time needed for the testing step in CPU seconds, as reported by the interface script.
Traintime <r> <f> <alg>: The time needed for the training step in CPU seconds, as reported by the interface script.
Totaltime <r> <f> <alg>: The time needed for both the training and testing step, in CPU seconds, as reported by the interface script. For some learning algorithms it might not possible easily to get individual training and testing times since they carry out both steps in one program run. For this only the Totaltime value will be different from the missing value indicator.
Status <alg>: The final status of the experiment for this algorithm. This is estimated from the output of the interface scripts. Either ok if everything worked well, timeout if the CPU time limit was ???, unknown if the status could not be determined, and nok if something went wrong (e.g. the algorithm crashed).
Error <alg>: The final average error as calculated from the individuals error reported by the interface script.
Resubsterror <alg>: The final average Resubstitution error.
Size <alg>: The final average model size.
Testtime <alg>: The final average testing time.
Traintime <alg>: The final average training time.
Totaltime <alg>: The final average total time.
Stop: The date and time when the experiment was finished, in standard UNIX date format.

The `.stats` file

The stats file contains all the measures that get calculated from the .pred and .targets files by the run_stats program (the run_stats program gets called automatically at the end of run_exp unless explicitly supressed).

The variables in the .stats file for classification-type experiments:

Error <alg> <rep> <fold>: The error of the model learned by algorithm <alg> from the training set and evaluated on the testset for repition <rep> and fold <fold>.
Error <alg>: The error averaged over all classifications from all folds and repitions. Note that this will be different from the average of the fold/repitions Errors above, if fold sizes are not the same for all folds.
StdDevOfError <alg>: The standard deviation of the errors for all folds/repitions.
StdErrOfError <alg>: The standard error of the errors for all folds/repitions (i.e. the standard deviation divided by the squareroot of the number of errors)
Correct-Wrong <alg1> <alg2>: The number of cases where the classification was correct for algorithm <alg1> and wrong for algorithm <alg2>.
Wrong-Correct <alg1> <alg2>: The number of cases where the classification was wrong for algorithm <alg1> and correct for algorithm <alg2>.
pvalMcNemar <alg1> <alg2>: The p-value of the McNemar test for identical distributions of wrong/correct and correct/wrong counts.
pvalPairedTTest <alg1> <alg2>: The p-value of a paired t-test for the errors.
p-val_McNemar <alg1> <alg2>: OBSOLETE and only kept for backward compatibility!

The variables in the .stats file for regression-type experiments:

ErrorSSE: Sum of squared errors
ErrorMSE: Mean squared error
ErrorRMSE: Root mean squared error
ErrorNMSE: Normalized mean squared error
ErrorMAD: Mean absolute differences
ErrorNMAD: Normalized mean absolute deviation
RSquare: correlation coefficient between targets and predictions
p-MeanDiffZero <alg1> <alg2>: p value for the test for equal means

The `.dct` file

The DCT program and its output are documented in [DCT doc].

The targets files

For each fold of the crossvalidation, a file containing only the targets of the test file for this fold gets stored in the results directory. The name of this file is of the form <filestem>_<seed>_<fold>.targets.

These files are necessary for the run_stats program to calculate error estimates and similar measures.

The prediction files

For each combination of learning algorithm and fold of the crossvalidation, a file containing only the predictions of this learning algorithm for the test file gets stored in the results directory. The name of this file is of the form <filestem>_<seed>_<fold>_<alg>.pred.

These files are necessary for the run_stats program to calculate error estimates and similar measures.

Next: Solutions to frequent problems Up: METAL The METAL Machine Previous: Running Experiments Contents

2002-10-17