SEMPRE - Semantically Aware Profiling for Recommenders

The amount of digital information is constantly increasing, and search engines are still the means for accessing this information. However, recommender systems are at their best way to become a remedy to this unsatisfying situation. Especially in m- and e-commerce recommender systems have already achieved a high level of attention and application. Unfortunately current recommender technology suffers from two major shortcomings. They require a huge amount of expertise and handcrafting for modelling an application domain, and they rely in their recommendations on the behaviour and opinions of those users active in the system. This is a poor resource compared to the abundance of opinions and ratings out there in the internet.

With the project SEMPRE, we aim at developing semantic technology for exploiting the rich and dynamic resource of factual information and human opinions available on the internet. This is achieved through an iterative, adaptive semantics-driven process where existing profiles and domain ontologies are used as seed knowledge to access further information on the web, and where data extracted from the web are employed to extend and refine the domain ontologies and profiles applied in recommendation. To achieve our goal we bring together research from (i) recommender technology, (ii) web-based document, text and opinion mining including amongst others methods and techniques from information retrieval, information extraction and question-answering, and (iii) semantic technologies such as those employed in ontology creation, ontology adaptation and the exploitation of ontologies for information mining from web documents. From a technological point of view SEMPRE aims at providing a generic solution. Rather than being a monolithic system, SEMPRE is devised as a platform for the semantic integration of (small) specialised services, plug-in tools leading to different results under different contexts and conditions.

OFAI's work resulted in the design and implementation of an architecture and processing pipeline for extracting factual knowledge and opinions, and for ontology population. The work is realized by means of two GATE pipelines, the one for factual information, the other one for opinions. A variety of GATE plug-ins, libraries and processing components have been implemented. Most of these components are application-independent.
For demonstration purposes, we concentrated on source identification and retrieval from the music and movie domain. The textual resources gathered for both extraction of factual information and opinions include: Wikipedia for factual information on musical artists, IMDb for factual information on actors, and reviews from the Multi-Domain Sentiment Dataset, the Polarity Dataset, and metacritic.com for opinion mining. As regards ontology generation and adaptation, we investigated methods and strategies for populating manually constructed ontologies with facts as well as providing sentiment information as a result of sentiment classification and opinion mining from the review data.


Duration: 2007-2009
Sponsors: FFG - Österreichische Forschungsförderungsgesellschaft mbH,
3united mobile solutions ag, Vienna
Researchers: Bernhard Jung, Johann Petrak, Hannes Pirker, Brigitte Krenn, Marcin Skowron
Partners: Smart Agent Technologies, Vienna,
VeriSign Communications GmbH, Vienna