OFAI

FACILE: Fast and Accurate Categorisation of Information by Language Engineering

The FACILE project aimed at the development of a system for the categorisation of texts from the area of finance and business news in an exact and specific way. The intended users of the system were institutions from finance and commerce that have a vital interest in up-to-date business information stemming from online news agencies and periodicals. An important consideration in FACILE has been its use across country and language borders. The possibility to process texts in various languages and to derive factual information in a language independent, formatted form shall should allow for the rapid dissemination of information across borders.

From a technical point of view, the system makes use of two complementary strategies: a shallow analysis based on pattern matching methods shall account for a correct categorization and shall provide basic information about the texts processed. This form of analysis is being implemented for texts in English, German, Italian, and Spanish. The advantage of this strategy is that it can be easily ported to additional languages as well. For English and Italian texts a deep analysis will be undertaken additionally, based on state-of-the-art methods from the field of computational linguistics. This form of analysis allows for a better interpretation of texts, albeit at the expense of a considerable higher development effort.

The consortium comprised partners from Germany, the UK, Italy, Austria, and Spain. Special attention had been given to the integration of future users right from the start of the project. OFAI's role in the consortium was to contribute to the development of the preprocessor--responsible for tokenization, morphological analysis and proper name recognition--and its adaptation to German. Moreover, OFAI was responsible for supporting the German user sis in creating the necessary patterns for the German shallow analysis.


Duration: 36 months, 1996 - 1998
Sponsor: European Commission (Language Engineering Sector of the Telematics Application Programme)
Researchers: Johannes Matiasek, Harald Trost
Partners: Quinary (Milan), IRST (Trento), UMIST (Manchester), SEMA (Madrid), sis (Berlin), Caja Segovia, Italrating (Milan)