Architecture and Effective Development of a High-Quality Part-of-Speech Tagger

The general aim of the project is enhancing the quality of Part-of-Speech tagging by developing a tagger combining the statistical approach with the Constraint Grammar based approach in such a way that
  • strengths of each of the approaches are accented
  • weaknesses are mutually compensated for.

Apart from these theoretical aims, a validation/practical demonstration of the developed methodology is also due, together with an evaluation of the practical results achieved.

This sums up to the following three main objectives of (and simultaneously to the three innovations in the field of PoS-tagging contributed by) the project:

1. proposing and advocating a novel tagger architecture combining the statistical and the Constraint Grammar based tagging scheme into a tagging system with higher accuracy than any of its components taken alone;

2. developing a systematic method for writing rules of a Constraint Grammar tagger, together with a novel and more powerful method of their application;

3. implementing and evaluating a combined tagger for German, employing the TnT tagger by T. Brants as the statistical component and the newly developed Constraint Grammar tagger for German, and using the NEGRA corpus as the evaluation standard.

Duration: 2003 - 2006
Sponsor: Austrian Science Foundation (FWF)
Researchers: Karel Oliva, Stefan Klatt, Alexandra Klein, Friedrich Neubarth, Harald Trost