A New Modular Architecture for Data Mining

This project aims at developing a new, more flexible model of Data Mining, and an operational software architecture embodying this model. Data Mining, also known as Kniowledge Discovery in Databases (KDD), is a field at the intersection of Artificial Intelligence (specifically Machine Learning), statistics, and databases that deals with the development of methods for detecting patterns and regularities (in short: "knowledge") in large amounts of data. A large number of successfull practical applications of data mining techniques in business, industry, and science have recently been reported.

Nevertheless, the application of KDD or data mining techniques to real-world problems is still a very tediouis process that requires a high degree of expertise and effort on the part of the user. Successful KDD is a matter of carefully configuring and tuning a complex machinery of machine learning and statistical algorithms. Current data mining tools (and the currently popular general KDD process models) do not support this process well, because they allow control and customization only at a very coarse abstraction level. On the other hand, while fine-grained control may lead to improved results, it also opens up a much larger space of options and choices that a uiser is faced with. In summary, the problems that motivate this research project are two-fold: (1) a lack of flexibility of current methods and tools on the one hand, and (2) a lack of guidance through the highly complex and interactive process of data mining on the other hand.

This project aims at providing solutions to these problems by developing a new modular framework of data mining that allows to reason about the data mining process and algorithms at a new level of detail. In a large-scale effort, we will first try to develop a general 'vocabuary' of common functional building blocks and shared concepts of KDD algorithims. We will then characterize and 'reconstruct' data mining algorithms and the KDD process model in terms of these concepts. We will also use this functional vocabulary as the basis for extended systematic investigations into novel data mining algorithms derived by combining and integrating individuial methods and strategies. The ultimate goal of the project is to arrive at a data mining architecture that, can be easily tailored to the partictilar needs of a given application. To this end, we will develop methods for flexibly (and partly automatically) assembling and customizing data mining algorithms to the problem at hand, again using the abstract functional building blocks identified earlier. And finally, we will investigate various methods for providing guidance to the user in navigating through the huge space of possible methods and method combinations.

Directions to be investigated include autonomous exploratory experimentation by the system and the application of meta-learning methods to results of large-scale experiments. In short, the main objectives of this project are: to develop a modular framework- for KDD and data mining algorithms at a new level of granularity; to develop methods for flexible specification and realization of task-specific algorithms and algorithm combinations; and to provide help to the user in choosing and assembling the best possible method or method combination for the. task at hand. An additional result of the project will be a fully implemented modular software architecture for data mining experiments and applications. This environment will allow the specification and assembly of new algorithms and composite strategies from common abstract functionalities. Also, the architecture will facilitate a tighter, more fine-grained integration of different methods than is possible in current KDD tools. This will be demonstrated in a large array of experiments with data collections and databases from diverse application areas.

Research staff

  • Gerhard Widmer

Sponsor

Austrian Science Fund (FWF)

Einzelprojekte – № P 12645

Key facts