VORTRAG
*******
Oesterreichisches Forschungsinstitut fuer Artificial Intelligence(OeFAI)
Schottengasse 3, A-1010 Wien
Tel.: +43-1-53361120, Fax: +43-1-5336112-77, Email: sec@oefai.at
-------------------------------------------------------------------------
Dr. Karel Oliva und Mag. Pavel Kveton
Oesterreichisches Forschungsinstitut
fuer Artificial Intelligence, Wien
A LINGUISTIC BASIS OF CORRECTLY TAGGED POS CORPORA
In this talk, we shall first review two notions from the area of
statistical (i.e. purely "quantitative-based") language processing:
"representativity of a corpus" and "bigram", and we shall try to
give them a linguistic ("qualitative") interpretation. Based on
these considerations, we shall develop a practical technique serving
for detection of errors in a part-of-speech tagged corpus. Further,
we shall generalize the approach in two orthogonal directions: from
bigrams to n-grams (for any natural n) and from error detection to
genuine tagging. In the last section, we shall illustrate the
error-detection method developed on the NEGRA corpus of German, and
discuss the general implications of the linguistics-based framework
developed for statistical taggers.
Zeit: Donnerstag, 21. Maerz 2002, 18:30 Uhr pktl.
Ort: Oesterreichisches Forschungsinstitut
fuer Artificial Intelligence
Schottengasse 3, 1010 Wien.
OESTERREICHISCHES FORSCHUNGSINSTITUT
FUER ARTIFICIAL INTELLIGENCE
o.Univ.-Prof. Dr. Robert Trappl