next up previous contents
Next: Running Experiments Up: METAL The METAL Machine Previous: The Programs   Contents

Subsections


Adapting METAL-MLEE

If you want to use METAL-MLEE with other learning algorithms than those for which interface scripts (see 5.2) already exist, you need to create interface scripts for that purpose.

In a similar way, you can also add additional preprocessing algorithms.

Adapting METAL-MLEE to additional algorithms essentially consists in adding the necessary interface programs. The best way to do this is to copy and adapt an existing interface program for a similar algorithm. The interface programs are written in Perl, some knowledge of Perl will be necessary to create a new interface program.

For each type of algorithm, there is a heavily commented template file that can be used as a basis for a new interface program.

In order to run a certain list of interface scipts automatically (insteading of specifying them using the -l option of the run_exp script), edit the config.pm script and change the lists of script names given there for the default classification, default regression, default classification data-measurement and default regression data-measurement algorithms.

Adding Learning Algorithm Interface Scripts

In order to be usable with METAL-MLEE, the folliwng conditions must be fulfilled:

For classification learning algorithms refer to the template run_cla_TEMPLATE, for regression learning algorithms refer to run_rla_TEMPLATE.

Figure: The run_cla_TEMPLATE file
\begin{figure}{\footnotesize\begin{verbatim}...

Figure 1 shows the run_cla_TEMPLATE file with all comments removed and line numbers added. To adapt the file to some learning algorithm, copy it to a file run_cla_xxx (for a classification algorithm) where xxx is the name of the learning algorithm. Follow the advice given in the comments in the template file to program the interface file for your learning algorithm.

Here a few notes on adapting the template:

Adding Preprocessing Algorithms

Preprocessing algorithms will change the database before the learning algorithm is applied. For each run_exp experimentation run, you can optionally specify one preprocessing algorithm. That algorithm will be called for each fold of the crossvalidation. In order for run_exp to be able to call the preprocessing algorithm, an interface script must be provided.

Preprocessing algorithms, like learning algorithms, have to process the training and testing sets for each fold separately. A typical interface script will contain two calls to the preprocessing program, one for the training file and one for the test file.

Note: The preprocessing algorithm should never use information from the class labels in the test set! The preprocessing algorithm should always carry out exactly the same preprocessing transformation on the test set as on the training set. If the preprocessing algorithm adapts itself to the input dataset you must take care that this does not happen when the test set is processed! For example, a class-aware discretization algorithm should discretize the numeric attributes in the test set in exactly the same way as it discretizes the attributes in the trainingset instead of calculating new discretization intervals, based on the specific information in the test set.

This is important, because of the practical reason that otherwise the content or format of the generated training and test files could be incompatible, but more importantly, because of the theoretical reason that anything else would be cheating - using information from the test set that should be regarded as completely unavailble for the estimation procedure.

As with the interface scripts for learning algorithms, use one of the scripts included in the package as a template.


next up previous contents
Next: Running Experiments Up: METAL The METAL Machine Previous: The Programs   Contents
2002-10-17