MultiModWebAccess: A Multimodal Speech Interface for Accessing Web Pages

While in everyday life, humans communicate with their environment by language, spoken as well as written, supported by signs and gestures, human-computer interaction is still far behind. Currently, graphical user interfaces (GUI) are a de-facto standard; the logical next step is to move toward an even richer and more natural interaction by integrating communication via language. A prominent example for this need is demonstrated by the World Wide Web (WWW) which is of growing importance in everyday life. While developers of WWW pages may use any combination of text, audio, image, and video in their presentation to address the user - thus fully exploiting the multimedia possibilities of the web - the users' reaction is much more limited in being restricted mainly to point-and-click operations.

Complex types of interaction, however, cannot be handled by mouse clicking and typing simple phrases alone. By adding better language capabilities, the gap between navigation and interaction in a communicative setting can be bridged. Language-based queries provide also the advantage of reaching through the hypertext structure directly to the required (textual) information. This frees the user from a dependence on the document structure offered by the content provider, which is advantageous because often, users' and content providers' intentions may differ. At the same time, the user is not restricted to any predefined wording of the query.

This project aimed at showing new ways of integrating speech and language with classical access methods to the World Wide Web. Shortcomings and advantages of different access method combinations were investigated. As a testbed application, a system for providing access to German language newspapers available on-line was developed.

There are three focal areas of research where the project contributed: First, empirically founded pre-design studies provided important insights about the role of speech in a multimodal system for accessing web pages. Second, research in this project yielded insights concerning mechanisms for analyzing spoken utterances in the context of a multimodal environment. While speech recognition for spontaneous speech usually results in high word error rates, it was to be expected that the background knowledge derived from the state of the interface will help in selecting the intended utterance. Finally, a prototypical access system to the WWW which includes text and speech input was developed which allows for queries addressing browsing functionality, stucture and content. In combination with the empirical research undertaken, this system provides insights into usability thereby giving important cues for the design of systems featuring multimodal interaction.

Duration: 1999 - 2003
Sponsor: Austrian Science Foundation (FWF)
Researchers: Alexandra Klein, Michel Généreux, Harald Trost