Abstract

Alexander Mehler, Tim vor der Brück
DDC-Document Classification with Support Vector Machines


In this work we present a hierarchical classifier based on support vector machines. It assigns a document one or more DDC categories. The DDC (Dewey Decimal Classification) is the most popular document classification scheme used by digital libraries. The classification is hierarchical; currently, it maps three levels. Our classifier is integrated in the eHumanities Desktop, which is developed by the text-technology lab at Goethe-University Frankfurt (http://hudesktop.hucompute.org/ ). Our implementation of the classifier allows for the visualization of lemmata that are selected or sorted out by the classifier. In this way, expert users get control of feature extraction as performed by our DDC classifier. Beyond feature extraction, the classifier also supports feature expansion based on Wikipedia.



Bielefeld University Library - last update: 28/03/2012