OFAI

Extended List Gazetteer Plugin

Description

NOTE: This plugin is out of date. You can find a newer version of the Extended Gazetteer PR in the StringAnnotation plugin.

 

This plugin provides an extended version of the original GATE ANNIE Gazetteer (DefaultGazetteer). In addition to the features of the original, built-in version of the List Gazetteer, this version provides features for more powerful matching of partial words:

  • It is possible to define four ways of how to define a word: 1) everything that is not whitespace 2) everything that is not a letter 3) everything that is not a digit and 4) everything that is neither a letter nor a digit.
  • In addition, you can define additional characters to be either part of words or part of whitespace
  • The program will create additional annotations LookupPrefix and LookupSuffix for parts of a word that are before or after the part that is matched with an entry from the gazetteer. The features majorType, minorType, and ontology are the same as for the corresponding Lookup annotation.
  • The Lookup annotations include the additional boolean features atEnd and atBeginning to indicate whether the match is at the end/beginning of a word.
  • All generated annotations can unclude the string feature which contains the actual text that corresponds to the annotation. This can be useful in JAPE rules to match exceptions etc.
  • For the Lookup and Lookup_prefix annotations, the additional features firstcharUpper (true or false)and firstcharCategory (the integer value corresponding to the Unicode category of the character) are generated

NOTE: Other than with the GATE ANNIE Gazetteer, all parameters except the URL of the input file, the encoding, and the feature separator are runtime parameters now, and thus not visible when you create the resource. They can, however, be changed at any time once you include the resource in a pipline. This makes it possible to change them without the need to re-create the processing resource.

Current version: 1.3

You can download the plugin as

INSTALLATION: both the gzipped and the ZIP file contain a precompiled version compiled with Sun JDK 1.6 under Linux. This should work with other OS or Java versions, but if not, the package can be recompiled in the standard way with a simple ant command.

Simply unpack the archive, then within GATE go to File->Manage Creole Plugins, press the "Add new CREOLE repository" button and select the directory you have just created.

After the plugin has been loaded this way, you should find the new processing resource "Extended List Gazetteer" in the "New" menu for processing resources.

License

This plugin is available under the GNU Lesser General Publice License

See also: Other GATE Plugins and Resources