20102012
Audiominer - Mathematical Signal Analysis and Modeling for Manipulation of Sound Objects
A project sponsored by the Vienna Science and Technology Fund (WWTF)
Project lead: Monika Dörfler, NuHAG (Numerical Harmonic Analysis Group), Faculty of Mathematics, University of Vienna
Partners: Arthur Flexer, OFAI; Simon Dixon, Queen Mary, University of London; Bruno Torrésani, Centre de Mathématique et d'Informatique, Laboratoire d'analyse topologie probabilité
General description
An important open research question in Music Information Retrieval (MIR) is the extraction of sound objects
(e.g. a guitar chord, the beat of a bass drum, the bark of a dog) from polyphonic audio. Recent theoretical
advances in mathematical signal processing indicate the possibility of a decisive improvement in identifying
and extracting sound objects directly from the time-frequency plane. These mined sound objects can then
automatically be organized into perceptually meaningful and easy to navigate sound libraries using the
latest innovations in MIR based music similarity. Together this will form the core of a powerful new toolkit
for audio manipulation with widespread applicability in fields like Sound Design, Computational Auditory
Scene Analysis, Artefact Reduction or Audio Database Organisation
Scientific Abstract
An important open research question in Music Information Retrieval (MIR) is the extraction of sound objects
(e.g. a guitar chord, the beat of a bass drum, the bark of a dog) from polyphonic audio. Recent theoretical
advances in mathematical signal processing indicate the possibility of a decisive improvement in identifying
and extracting sound objects directly from the time-frequency plane. Due to the latest innovations in MIR
based music analysis, the mined sound objects can automatically be organised into perceptually
meaningful and easy to navigate sound libraries. Research along the following five topics will focus applied
mathematics and MIR on the task of signal analysis and modelling of sound objects:
- New adaptive time-frequency and time-scale methods allow for higher resolution of components of interest. Contrary to standard approaches with uniform resolution in the whole time frequency plane, adaptive methods provide resolution as detailed as required by the characteristics of local components.
- Mathematical models promoting sparsity bear the potential to further sharpen the resolution of classical as well as adaptive time-frequency representations. Different mixed-norms used as constraints on the expansion coefficients allow to enforce directional sparsity.
- MIR based pattern recognition and comparison methods will identify mine-able sound objects in the time-frequency representation thereby informing mathematical methods where extraction of objects should exactly happen.
- Understanding the properties of the linear operators generated by the process of time- frequency masking will help to minimise occurrence of artefacts during the extraction of sound objects.
- Computation of similarity between sound objects will establish quantitative relationships and structuring of libraries of sound objects. Visualisation in low dimensional maps will enable easy and intuitive navigation through these structured sound spaces.
