===================================================================== DOCUMENTATION for the SPEEDURCONT-CORPUS ===================================================================== (C) Friedrich Neubarth, Hannes Pirker Austrian Research Institute for AI (OFAI) Last Update: 12 Feb 2007 ==================================================================== Lexical Content and Naming Conventions: ==================================================================== There is one file per sentence. Files are named 'sNNN.XXX.wav' and 'sNNN.XXX.xml", where NNN is the sentence number and XXX represents the type/lexical content of the file. Values for XXX and their meanings are: -- no -- "Nordwind und Sonne": A short story containing phonetically balanced text. -- bu -- "Buttergschichte": A short story containing phonetically balanced text. -------------------- zg1, zg2, ..., zg22 -------------------- "Zeitungsartikel": 22 Short articles more or less randomly selected from Austrian newspapers. ------------------- sa1, sa2, ... , sa5 ------------------- "Einzelsaetze": 298 isolated 'standard' sentences taken from various sources (e.g. Phondat, Marburg-Saetze,...). They are grouped according to their original sources: sa1: s001 - s100 sa2: s101 - s120 sa3: s121 - s183 sa4: s184 - s253 sa5: s254 - s298 -------------------- fa1, fa2, ... , fa5 -------------------- "Frage-Antwort Paare": Highly uniform pairs of questions and answers (to be correct: only the *answers*!), used for controlled induction of different focus conditions (broad vs. narrow focus). They are grouped in bundles of 50 sentences each. fa1: s001 - s050 fa2: s051 - s100 ... fa5: s201 - s250 ====================================================================