OFAI-TR-2001-30 ( 80kB g-zipped PostScript file,  2453kB PDF file)

Hyperlink Ensembles: A Case Study in Hypertext Classification

Johannes Fürnkranz

In this paper, we introduce hyperlink ensembles, a novel type of ensemble classifier for classifying hypertext documents. Instead of using the text on a page for deriving features that can be used for training a classifier, we suggest to use portions of texts from all pages that point to the target page. A hyperlink ensemble is formed by obtaining one prediction for each hyperlink that points to a page. These individual predictions for each hyperlink are subsequently combined to a final prediction for the class of the target page. We explore four different ways of combining the individual predictions and four different techniques for identifying relevant text portions. The utility of our approach is demonstrated on a set of Web-pages that relate to Computer Science Departments.

Keywords: web mining, hypertext classification, ensemble techniques, inductive rule learning

Citation: Fürnkranz J.: Hyperlink Ensembles: A Case Study in Hypertext Classification. Information Fusion 3(4):299-312, December 2002, Special Issue on Fusion of Multiple Classifiers.