OFAI

2010–2012

Audiominer - Mathematical Signal Analysis and Modeling for Manipulation of Sound Objects

A project sponsored by the Vienna Science and Technology Fund (WWTF)

Project lead: Monika Dörfler, NuHAG (Numerical Harmonic Analysis Group), Faculty of Mathematics, University of Vienna
Partners: Arthur Flexer, OFAI; Simon Dixon, Queen Mary, University of London; Bruno Torrésani, Centre de Mathématique et d'Informatique, Laboratoire d'analyse topologie probabilité

General description

An important open research question in Music Information Retrieval (MIR) is the extraction of sound objects (e.g. a guitar chord, the beat of a bass drum, the bark of a dog) from polyphonic audio. Recent theoretical advances in mathematical signal processing indicate the possibility of a decisive improvement in identifying and extracting sound objects directly from the time-frequency plane. These mined sound objects can then automatically be organized into perceptually meaningful and easy to navigate sound libraries using the latest innovations in MIR based music similarity. Together this will form the core of a powerful new toolkit for audio manipulation with widespread applicability in fields like Sound Design, Computational Auditory Scene Analysis, Artefact Reduction or Audio Database Organisation

Scientific Abstract

An important open research question in Music Information Retrieval (MIR) is the extraction of sound objects (e.g. a guitar chord, the beat of a bass drum, the bark of a dog) from polyphonic audio. Recent theoretical advances in mathematical signal processing indicate the possibility of a decisive improvement in identifying and extracting sound objects directly from the time-frequency plane. Due to the latest innovations in MIR based music analysis, the mined sound objects can automatically be organised into perceptually meaningful and easy to navigate sound libraries. Research along the following five topics will focus applied mathematics and MIR on the task of signal analysis and modelling of sound objects:

  • New adaptive time-frequency and time-scale methods allow for higher resolution of components of interest. Contrary to standard approaches with uniform resolution in the whole time frequency plane, adaptive methods provide resolution as detailed as required by the characteristics of local components.
  • Mathematical models promoting sparsity bear the potential to further sharpen the resolution of classical as well as adaptive time-frequency representations. Different mixed-norms used as constraints on the expansion coefficients allow to enforce directional sparsity.
  • MIR based pattern recognition and comparison methods will identify mine-able sound objects in the time-frequency representation thereby informing mathematical methods where extraction of objects should exactly happen.
  • Understanding the properties of the linear operators generated by the process of time- frequency masking will help to minimise occurrence of artefacts during the extraction of sound objects.
  • Computation of similarity between sound objects will establish quantitative relationships and structuring of libraries of sound objects. Visualisation in low dimensional maps will enable easy and intuitive navigation through these structured sound spaces.
The central task of this project is the transformation of theoretical results in mathematical signal processing to the real world of audio signals in order to solve an important problem in Music Information Processing. By bridging disciplines and extending results from the theoretical to the applied, a new quality in sound manipulation will be created with a big market potential. Our vision is a software system able to "reverse engineer" complex audio records with multiple, possibly overlapping events into its constituents along both the time and frequency axis. This will form the core of a powerful new tool for audio manipulation with widespread applicability in fields like Sound Design, Computational Auditory Scene Analysis, Artefact Reduction or Audio Database Organisation.

Publications

Balazs P., Dörfler M., Kowalski M. and TorrĂ©sani B.: Adapted and adaptive linear time-frequency representations: a synthesis point of view, IEEE Signal Processing Magazine 30, 6, 20-31, 2013.

Balazs P., Dörfler M., Jaillet F., Holighaus N., Velasco G.: Theory, implementation and applications of nonstationary Gabor frames, J. Comput. Appl. Math. 236, 6 (2011) 1481-1496, 2011.

Dörfler M.: Allocating, detecting and mining sound structures: An overview of technical tools, in Proceedings of Artificial Intelligence Applications and Innovations, IFIP Advances in Information and Communication Technology 382, 470-479, Springer, Boston, 2012.

Dörfler M.: Quilted frames - a new concept for adaptive representation, Advances in Applied Mathematics, to appear, Doi: 10.1016/j.aam.2011.02.007, 2011.

Dörfler M., Gröchenig, K.: Time-frequency partitions and characterizations of modulations spaces with localization operators, J. Funct. Anal. 260, 7, 1903-1924, ZBL:1210.42049, 2011.

Dörfler M., Matusiak E.: Identifying novelty and sound objects in texture sounds by sparse adaptation of Gabor coefficients, in Proceedings of the 10th International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, 2013.

Dörfler M., Matusiak, E.: Tracing Sound Objects in Audio Textures, in Proceedings of Sampta13, 2013.

Dörfler M., Romero J.: Frames adapted to a phase-space cover, Constr. Approx., 2012.

Dörfler M., Torrésani B.: Representation of operators by sampling in the time-frequency domain, Sampling Theory in Signal and Image Processing, Vol. 10, No. 1-2, 2011, pp. 171-190, 2011.

Dörfler M., Velasco G., Flexer A., Klien V.: Sparse Regression in Time-Frequency Representations of Complex Audio, Proceedings of the 7th Sound and Music Computing Conference (SMC'10), Barcelona, Spain, 2010. also available as: TR-2010-08.

Evangelista G., Dörfler M., Matusiak E.: Arbitrary Phase Vocoders by means of Warping, Musica/Tecnologia 7, 2013.

Evangelista G., Dörfler M., Matusiak E.: Phase vocoders with arbitrary frequency band selection, in Proceedings of the 9th Sound and Music Computing Conference, July 11-14th, Kopenhagen, 2012.

Flexer A., Gasser M., Schnitzer D.: Limitations of interactive music recommendation based on audio content, Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 96-102, 2010. also available as: TR-2010-11.

Flexer A., Schnitzer D., Gasser M., Pohle T.: Combining features reduces hubness in audio similarity, Proceedings of the Eleventh International Society for Music Information Retrieval Conference (ISMIR'10), 2010. also available as: TR-2010-06.

Gasser M., Flexer A., Grill T.: On Computing Morphological Similarity of Audio Signals, Proceedings of the 8th Sound and Music Computing Conference (SMC'11), Padova, Italy, 2011. also available as: TR-2010-14.

Gasser M., Flexer A., Schnitzer D.: Hubs and Orphans - an Explorative Approach, Proceedings of the 7th Sound and Music Computing Conference (SMC'10), Barcelona, Spain, 2010. also available as: TR-2010-07.

Grill T., Flexer A., Cunningham S.: Identification of perceptual qualities in textural sounds using the repertory grid method, in Proceedings of the 6th Audio Mostly Conference, Coimbra, Portugal, 2011. also available as: TR-2011-08.

Holighaus N. Dörfler M., Velasco G., Grill T.: A framework for invertible, real-time constant-Q transforms, IEEE Trans. Audio Speech Lang. Process. 21, 4, 775 -785, 2013.

Holzapfel A., Velasco G., Holighaus N., Doerfler M., Flexer A.: Advantages of nonstationary Gabor transforms in beat tracking, Proceedings of the First International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies (MIRUM'11), Arizona, USA, 2011. also available as: TR-2011-19.

Holzapfel A., Flexer A., Widmer G.: Improving tempo-sensitive and tempo-robust descriptors for rhythmic similarity, in Proceedings of the 8th Sound and Music Computing Conference (SMC'11), Padova, Italy, 2011. also available as: TR-2011-07.

Klien V., Grill T., Flexer A.: On automated annotation of acousmatic music, Journal of New Music Research, DOI:10.1080/09298215.2011.618226. also available as: TR-2011-06.

Kowalski M., Siedenburg K., Dörfler M.: Social Sparsity! Neighborhood Systems Enrich Structured Shrinkage Operators, IEEE Trans. Signal Process. 61, 10 (2013) 2498 - 2511, 2013.

Majdak P., Balazs P., Kreuzer W., Dörfler M.: Increasing the Signal-to-Noise Ratio in system Identification with Exponential Sweeps by Thresholding in the Time-Frequency Domain, Proceedings of ICASSP 2011, 2011.

Matusiak E., Eldar Y.C.: Sub-Nyquist sampling of short pulses, Proceedings of IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2011), 2011.

Matusiak E., Eldar Y.C.: Xampling of unknown pulses, Proceedings of the 2011 Workshop on Sampling Theory and Applications (SampTA'11), 2011.

Schnitzer D., Flexer A., Schedl M., Widmer G.: Using Mutual Proximity to Improve Content-Based Audio Similarity, in Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR'11), Miami, Florida, USA, 2011. also available as: TR-2011-14.

Schnitzer D., Flexer A., Widmer G., Gasser M.: Islands of Gaussians: The Self Organizing Map and Gaussian Music Similarity Features, Proceedings of the Eleventh International Society for Music Information Retrieval Conference (ISMIR'10), 2010. also available as: TR-2010-10.

Siedenburg K., Dörfler M.: Persistent Time-Frequency Shrinkage for Audio Denoising, J. Audio Eng. Soc. 61, 1/2, 2013.

Siedenburg, K., Dörfler, M.: Audio Denoising by Generalized Time-Frequency Thresholding, Proceedings of the AES 45th Conference on Applications of Time-Frequency Processing, Helsinki, Finland, 2012.

Siedenburg, K., Dörfler, M.: Structured sparsity for audio signals, Proceedings of DAFX'11, Paris, 2011.

Siedenburg K., Dörfler M., Kowalski M.: Audio Inpainting with Social Sparsity, in Proceedings of Spars2013, Lausanne, Switzerland, 2013.

Velasco, G., Holighaus, N., Dörfler, M., Grill, T.: Constructing an invertible constant-Q transform with non-stationary Gabor frames, Proceedings of DAFX'11, Paris, 2011.

Software

Theory and Implementation of Nonstationary Gabor Frames

Structured Sparsity for Audio Signals

WoMen at work

Arthur Flexer at ISMIR 2010 (photo by Sebastian Stober)


Ewa Matusiak, Gino Velasco, Monika Dörfler, Nicki Holighaus at STROBL 2011

Thomas Grill at the Pure Data conference, Weimar, 2011

Kai Siedenburg and Monika Dörfler at DAFX 2011 (http://dafx11.ircam.fr), Paris