Next: Structure and Organization of Up: METAL The METAL Machine Previous: Adapting METAL-MLEE Contents

Running Experiments

First make sure the data is in standard format (see Section 4). The main experimentation program by default does a quick check, but you should use the checking program check_database.pl on the full database. Depending on the format your data is originally in, the steps to convert it into METAL-format might be very different.

Here are some hints what kind of conversion might be necessary:

It might be necessary to convert fiels from DOS to UNIX format
The database should be available in a format that is as close as possible to ``CSV'' (comma-separated values) format. Many programs that export CSV format will put non-numeric values in quotes; these have to be removed for METAL format.
Be careful that removing special characters originally used for non-numeric values but not allowed in the METAL format will not cause several different values to get mapped to one value!
Missing values are often coded as ``empty strings''. Missing values must be coded as question marks for METAL format, both for numeric and non-numeric fields.

METAL-MLEE lets you choose rather freely how to run the necessary experiments: run different algorithms on different machines, run different databases on different machines, run different algorithms on the same machine but at different times etc.

You should consider the following points when planning the experiments:

For each experiment you should have a separate output directory. If you run different algorithms for the same database at different times on the same machine, you can simply reuse the output directory: the new target/prediction files will be added to the directory, and the .results and .log files will be appended with the new data (unless the option -o for run_exp is specified, which will overwrite the olde .results and .log files.
If you run experiments on the same file system, take care that different experiments will not use identical files to prevent data loss. run_exp uses temporary file names for some files to prevent this, but output files might still be identical.
If you run some algorithms for a database on machine A and other algorithms on machine B it is advisable to use different output directories for these runs and then merge the created files. Results must be merged by copying together the generated .pred, .target, .dct files and concatening together all .results files to the final .results file and all .log files to the final .log file. The script exp_append_results will do this for a source and a target directory: the source directory must contain a subdirectory for each filestem for which an experiment was run. The destination directory will contain a subdirectory for each filestem. Copying together is done by repeatedly compying partly results for several filestems from different source directories to the same destination directory.
Running different algorithms on different machines will it make harder to compare CPU time measurements even for the same file stem.
The simplest way to carry out an experiment is to run all algorithms for a filestem on the same machine in a single run of run_exp.

Next: Structure and Organization of Up: METAL The METAL Machine Previous: Adapting METAL-MLEE Contents

2002-10-17