Statistical parametric speech synthesis has significantly improved the quality and flexibility of speech synthesis systems. Its uses in dialect interpolation, singing synthesis, and other application areas is the subject of "Synthesizing Dialects, Faces, Singing Voices, Songbirds, and Famous Dead Actors", a talk by Michael Pucher of the Austrian Research Institute for Artificial Intelligence (OFAI). The talk is part of OFAI's 2023 Winter/Spring Lecture Series.
Members of the public are cordially invited to attend the talk in person (OFAI, Freyung 6/6/7, 1010 Vienna) or via Zoom on Wednesday, 1 March at 18:30 CET (UTC+1):
Meeting ID: 842 8244 2460
Talk abstract: During the last decades statistical parametric speech synthesis has significantly improved the quality and flexibility of speech synthesis systems. This development started with hidden Markov models (HMM) and then another big step of improvement in acoustic modeling and vocoding was made with deep neural networks (DNN). In this talk I will present a range of applications of statistical parametric speech synthesis that we have investigated. In the field of acoustic speech synthesis I will show how dialect interpolation can be realized, which allows for the generation of in-between language varieties. In audio-visual speech synthesis joint audio-visual modeling and visual control will be presented. In singing speech I will describe our work towards an opera style singing synthesis system that is trained on high quality opera singing data. A model for synthesis of singing birds will be presented that can control bird songs by symbolic input sequences. Finally, I will present a DNN-based synthesizer of a famous Austrian actor that we have built from audio book data, and that was used in a theater play. I will conclude my talk with an outlook on the future of speech synthesis technologies, remaining technical and possible societal challenges.
Speaker biography: Michael Pucher is Senior Researcher at the Austrian Research Institute for Artificial Intelligence (OFAI) and Senior Speech Technologist at Recognosco, Vienna, Austria. He obtained his doctoral degree (Dr.techn.) in Electrical and Information Engineering from Graz University of Technology in 2007. In 2017 he received the venia docendi in Speech Communication at Graz University of Technology with a habilitation thesis on Speech Processing for Multimodal and Adaptive Systems. His research interests are acoustic modeling for speech recognition, semantic language modeling, speech synthesis for language varieties, persona design for speech-based systems, multimodal and spoken dialog systems, audio-visual speech synthesis, synthesis of singing, synthesis of animal vocalisations, digital phonetics, and sociophonetics. He has also made significant contributions in the area of speaker verification spoofing, where we showed how adaptive synthesizers can spoof a speaker verification system.