This plugin provides an extended version of the original GATE ListGazetteer. In addition to the features of the original, built-in version of the List Gazetteer, this version provides features for more powerful matching of partial words:
LookupPrefix
and LookupSuffix for parts of a word that are before or after
the part that is matched with an entry from the gazetteer. The features
majorType, minorType, and ontology are
the same as for the corresponding Lookup annotation.
Lookup annotations include the additional boolean features
atEnd and atBeginning to indicate whether the
match is at the end/beginning of a word.
string feature which
contains the actual text that corresponds to the annotation. This
can be useful in JAPE rules to match exceptions etc.Lookup and Lookup_prefix
annotations, the additional features firstcharUpper (true or false)and firstcharCategory (the integer value corresponding
to the Unicode category of the character) are generated@DOCBEGIN and
@DOCEND which are located before the start and after the end
of the text and have zero length. Currently, no JAPE transducer that is
included with GATE can make use of zero-length annotations, but we
are working on a modified version that can. These annotations can simply be ignored
for now.
NOTE: All parameters except the URL of the input file are runtime parameters now, and thus not visible when you create the resource. They can, however, be changed at any time once you include the resource in a pipline. This makes it possible to change them without the need to re-create the processing resource.
In addition, the modified code uses a little trick to avoid the overhead of checking matches seperately at the end of the document.
Current version: 2006-10-18
You can download the plugin as
Javadoc documentation of the plugin
INSTALLATION: both the gzipped and the ZIP file contain a precompiled
version compiled with Sun JDK 1.5.0_06-b05 under Linux.
This should work with other OS or Java versions, but if not,
the package can be recompiled in the standard way with
a simple ant command.
Simply unpack the archive, then within GATE go to File->Manage Creole Plugins, press the "Add new CREOLE repository" button and select the directory you have just created.
After the plugin has been loaded this way, you should find the new processing resource "OFAI List Gazetteer" in the "New" menu for processing resources.
NOTE: versions after 2006-10-18 have most parameters defined as runtime parameters. This means that you have to change them in the pipeline, not when you first create the gazetteer object in the GUI. This makes it much easier to modify the parameters for an existing gazetteer object (previously, the old object had to be discarded and a new one had to be created for any parameter change).