Dr. K. Oliva und Mag. P. Kveton, OFAI, Wien

21 March 2002 Lecture

                                VORTRAG
                                *******

Oesterreichisches Forschungsinstitut fuer Artificial Intelligence(OeFAI)
                      Schottengasse 3, A-1010 Wien
 Tel.: +43-1-53361120,  Fax: +43-1-5336112-77,  Email: sec@oefai.at
-------------------------------------------------------------------------
  Dr. Karel Oliva und Mag. Pavel Kveton
  Oesterreichisches Forschungsinstitut 
  fuer Artificial Intelligence, Wien


           A LINGUISTIC BASIS OF CORRECTLY TAGGED POS CORPORA    


  In this talk, we shall first review two notions from the area of 
  statistical (i.e. purely "quantitative-based") language processing: 
  "representativity of a corpus" and "bigram", and we shall try to 
  give them a linguistic ("qualitative") interpretation. Based on 
  these considerations, we shall develop a practical technique serving 
  for detection of errors in a part-of-speech tagged corpus.  Further, 
  we shall generalize the approach in two orthogonal directions: from 
  bigrams to n-grams (for any natural n) and from error detection to 
  genuine tagging. In the last section, we shall illustrate the 
  error-detection method developed on the NEGRA corpus of German, and 
  discuss the general implications of the linguistics-based framework 
  developed for statistical taggers. 


Zeit:   Donnerstag, 21. Maerz 2002, 18:30 Uhr pktl.
Ort:    Oesterreichisches Forschungsinstitut 
        fuer Artificial Intelligence    
        Schottengasse 3, 1010 Wien.


OESTERREICHISCHES FORSCHUNGSINSTITUT
FUER ARTIFICIAL INTELLIGENCE

o.Univ.-Prof. Dr. Robert Trappl