SPARC - Semantic Phonetic Automatic Reconstruction of Dictations

The SPARC project aims at integrating semantic knowledge bases in automatic speech recognition systems for dictation applications. Speech recognition systems, which take spoken text as input and convert it into written text, have long reached a point where they can be commercially employed. An important application for speech recognition is automating document creation in institutions with a large dictation volume. This type of application poses a challenge for text processing due to its potentially large vocabulary. While in dialog or command-and-control systems, 'semantics' is represented by the underlying databases or the set of possible system actions, dictation systems have to handle texts with a much broader content, even if the domain is usually limited. In order to create documents from spoken texts, speech recognition systems usually only rely on an acoustic model and a language model which represents co-occurrence statistics of words. Based on this knowledge, a transcription of the spoken text is produced.

To fully employ the potential of language technology for automated dictation, systems must move away from simple transcriptions of the spoken utterances to document creation conforming to the formal and informal requirements of specific types of texts. By making use of explicit semantic information, our project will contribute to this new dimension in automatic speech recognition technology for dictation systems. Improvements gained with the integration of semantic knowledge will concern document quality, word error rate and usability.

Duration: 2005 - 2006
Sponsor: FIT-IT Programme (Semantic Systems)
Researchers: Alexandra Klein, Johannes Matiasek, Martin Huber, Jeremy Jancsary
Partners: Philips Speech Recognition Systems, Technical University Graz