...content stored in OA repositories and journals reflects the diversity of research disciplines. For example, information about the specific subject of a paper (e.g. computer science, humanities) can be used to narrow down search, to monitor trends and to estimate content growth in specific disciplines. Only about 1.4% (Pieper and Summann, 2006b) of items in OA repositories have been classified and manual classification is costly. We recently carried out a series of experiments with text-‐classification of full-‐text articles into 18 top-‐level classes of the DOAJ classification using a multiclass SVM. The experiments were carried out on a large balanced dataset of 1,000 documents, articles randomly selected from DOAJ. The system produced encouraging results achieving about 93% accuracy.

