Next: Structure and Organization of
Up: METAL The METAL Machine
Previous: Adapting METAL-MLEE
  Contents
Running Experiments
First make sure the data is in standard format (see Section 4).
The main experimentation program by default does a quick check,
but you should use the checking program check_database.pl
on the full
database. Depending on the format your data is originally in,
the steps to convert it into METAL-format might be very different.
Here are some hints what kind of conversion might be necessary:
- It might be necessary to convert fiels from DOS to UNIX format
- The database should be available in a format that is as close
as possible to ``CSV'' (comma-separated values) format. Many programs
that export CSV format will put non-numeric values in quotes; these
have to be removed for METAL format.
- Be careful that removing
special characters originally used for non-numeric values but
not allowed in the METAL format will not cause several different
values to get mapped to one value!
- Missing values are often coded as ``empty strings''.
Missing values must be coded as question marks for METAL format,
both for numeric and non-numeric fields.
METAL-MLEE lets you choose rather freely how to run the necessary
experiments: run different algorithms on different machines, run
different databases on different machines,
run different algorithms on the same machine but at different times etc.
You should consider the following points when planning the experiments:
- For each experiment you should have a separate output directory.
If you run different algorithms for the same database at different times
on the same machine, you can simply reuse the output directory: the
new target/prediction files will be added to the directory, and the
.results and .log files will be appended with the
new data (unless the option -o for
run_exp
is specified, which
will overwrite the olde .results and .log files.
- If you run experiments on the same file system, take care that
different experiments will not use identical files to prevent data loss.
run_exp
uses temporary file names for some files to prevent this,
but output files might still be identical.
- If you run some algorithms for a database on machine A and
other algorithms on machine B it is advisable to use different
output directories for these runs and then merge the created files.
Results must be merged by copying together the generated .pred,
.target, .dct files and concatening together
all .results files to the final .results file and
all .log files to the final .log file.
The script
exp_append_results
will do this for a
source and a target directory: the source directory must contain
a subdirectory for each filestem for which an experiment was run.
The destination directory will contain a subdirectory for each filestem.
Copying together is done by repeatedly compying partly results for several
filestems from different source directories to the same
destination directory.
- Running different algorithms on different machines will it make
harder to compare CPU time measurements even for the same file stem.
- The simplest way to carry out an experiment is to run
all algorithms for a filestem on the same machine in a single run
of
run_exp
.
Next: Structure and Organization of
Up: METAL The METAL Machine
Previous: Adapting METAL-MLEE
  Contents
2002-10-17