OFAI

2010–2012

Audiominer - Mathematical Signal Analysis and Modeling for Manipulation of Sound Objects

A project sponsored by the Vienna Science and Technology Fund (WWTF)

Project lead: Monika Dörfler, NuHAG (Numerical Harmonic Analysis Group), Faculty of Mathematics, University of Vienna
Partners: Arthur Flexer, OFAI; Simon Dixon, Queen Mary, University of London; Bruno Torrésani, Centre de Mathématique et d'Informatique, Laboratoire d'analyse topologie probabilité

General description

An important open research question in Music Information Retrieval (MIR) is the extraction of sound objects (e.g. a guitar chord, the beat of a bass drum, the bark of a dog) from polyphonic audio. Recent theoretical advances in mathematical signal processing indicate the possibility of a decisive improvement in identifying and extracting sound objects directly from the time-frequency plane. These mined sound objects can then automatically be organized into perceptually meaningful and easy to navigate sound libraries using the latest innovations in MIR based music similarity. Together this will form the core of a powerful new toolkit for audio manipulation with widespread applicability in fields like Sound Design, Computational Auditory Scene Analysis, Artefact Reduction or Audio Database Organisation

Scientific Abstract

An important open research question in Music Information Retrieval (MIR) is the extraction of sound objects (e.g. a guitar chord, the beat of a bass drum, the bark of a dog) from polyphonic audio. Recent theoretical advances in mathematical signal processing indicate the possibility of a decisive improvement in identifying and extracting sound objects directly from the time-frequency plane. Due to the latest innovations in MIR based music analysis, the mined sound objects can automatically be organised into perceptually meaningful and easy to navigate sound libraries. Research along the following five topics will focus applied mathematics and MIR on the task of signal analysis and modelling of sound objects:

  • New adaptive time-frequency and time-scale methods allow for higher resolution of components of interest. Contrary to standard approaches with uniform resolution in the whole time frequency plane, adaptive methods provide resolution as detailed as required by the characteristics of local components.
  • Mathematical models promoting sparsity bear the potential to further sharpen the resolution of classical as well as adaptive time-frequency representations. Different mixed-norms used as constraints on the expansion coefficients allow to enforce directional sparsity.
  • MIR based pattern recognition and comparison methods will identify mine-able sound objects in the time-frequency representation thereby informing mathematical methods where extraction of objects should exactly happen.
  • Understanding the properties of the linear operators generated by the process of time- frequency masking will help to minimise occurrence of artefacts during the extraction of sound objects.
  • Computation of similarity between sound objects will establish quantitative relationships and structuring of libraries of sound objects. Visualisation in low dimensional maps will enable easy and intuitive navigation through these structured sound spaces.
The central task of this project is the transformation of theoretical results in mathematical signal processing to the real world of audio signals in order to solve an important problem in Music Information Processing. By bridging disciplines and extending results from the theoretical to the applied, a new quality in sound manipulation will be created with a big market potential. Our vision is a software system able to "reverse engineer" complex audio records with multiple, possibly overlapping events into its constituents along both the time and frequency axis. This will form the core of a powerful new tool for audio manipulation with widespread applicability in fields like Sound Design, Computational Auditory Scene Analysis, Artefact Reduction or Audio Database Organisation.

Publications

Dörfler, M.: Quilted frames - a new concept for adaptive representation, Advances in Applied Mathematics, to appear, Doi: 10.1016/j.aam.2011.02.007, 2011.

Dörfler, M., Gröchenig, K.: Time-frequency partitions and characterizations of modulations spaces with localization operators, J. Funct. Anal. 260, 7, 1903-1924, ZBL:1210.42049, 2011.

Dörfler, M., Torrésani, B.: Representation of operators by sampling in the time-frequency domain, Sampling Theory in Signal and Image Processing, Vol. 10, No. 1-2, 2011, pp. 171-190, 2011.

Dörfler M., Velasco G., Flexer A., Klien V.: Sparse Regression in Time-Frequency Representations of Complex Audio, Proceedings of the 7th Sound and Music Computing Conference (SMC'10), Barcelona, Spain, 2010. also available as: TR-2010-08.

Flexer A., Gasser M., Schnitzer D.: Limitations of interactive music recommendation based on audio content, Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 96-102, 2010. also available as: TR-2010-11.

Flexer A., Schnitzer D., Gasser M., Pohle T.: Combining features reduces hubness in audio similarity, Proceedings of the Eleventh International Society for Music Information Retrieval Conference (ISMIR'10), 2010. also available as: TR-2010-06.

Gasser M., Flexer A., Grill T.: On Computing Morphological Similarity of Audio Signals, Proceedings of the 8th Sound and Music Computing Conference (SMC'11), Padova, Italy, 2011. also available as: TR-2010-14.

Gasser M., Flexer A., Schnitzer D.: Hubs and Orphans - an Explorative Approach, Proceedings of the 7th Sound and Music Computing Conference (SMC'10), Barcelona, Spain, 2010. also available as: TR-2010-07.

Majdak, P., Balazs, P., Kreuzer, W., Dörfler, M.: Increasing the Signal-to-Noise Ratio in system Identification with Exponential Sweeps by Thresholding in the Time-Frequency Domain, Proceedings of ICASSP 2011, 2011.

Schnitzer D., Flexer A., Widmer G., Gasser M.: Islands of Gaussians: The Self Organizing Map and Gaussian Music Similarity Features, Proceedings of the Eleventh International Society for Music Information Retrieval Conference (ISMIR'10), 2010. also available as: TR-2010-10.

Siedenburg, K., Dörfler, M.: Structured sparsity for audio signals, Proceedings of DAFX'11, Paris, 2011.

Velasco, G., Holighaus, N., Dörfler, M., Grill, T.: Constructing an invertible constant-Q transform with non-stationary Gabor frames, Proceedings of DAFX'11, Paris, 2011.

Software

Theory and Implementation of Nonstationary Gabor Frames

Structured Sparsity for Audio Signals

WoMen at work

Arthur Flexer at ISMIR 2010 (photo by Sebastian Stober)


Ewa Matusiak, Gino Velasco, Monika Dörfler, Nicki Holighaus at STROBL 2011

Thomas Grill at the Pure Data conference, Weimar, 2011

Kai Siedenburg and Monika Dörfler at DAFX 2011 (http://dafx11.ircam.fr), Paris