OFAI-TR-2005-18 ( 61kB g-zipped PostScript file,  91kB PDF file)

Statistical Evaluation of Music Information Retrieval Experiments

Arthur Flexer

This work concerns the necessity of statistical evaluation of Music Information Retrieval (MIR) experiments. This necessity is motivated by applying fundamental notions of statistical hypotheses testing to MIR research. Minimum requirements concerning statistical evaluation are developed and the appropriate statistical techniques are introduced and exemplified in a genre classification context. Articles from the MIR literature are examined and criticized for the lack of statistical evaluation they contain.

Keywords: Music Information Retrieval, Evaluation, Statistical Testing, Sampling Methods

Citation: Flexer A.: Statistical Evaluation of Music Information Retrieval Experiments. Technical Report, Österreichisches Forschungsinstitut für Artificial Intelligence, Wien, TR-2005-18, 2005