The PLOS Taxonomy consists of a thesaurus – the set of more than 10,000 terms and synonyms that are used as subject matter tags for PLOS articles – and the complex hierarchy by which those terms are organized. The whole system had to be custom-built for PLOS, thanks to the one unruly journal that now contains roughly 80% of all PLOS articles.

“PLOS ONE has no limitation to the scope of articles that it takes,” Drysdale told Bio-IT World. “So the thesaurus that we use needs to be able to cover everything: ecological articles, surgical articles, economics articles, social science, psychology, geology, geography, computational science – across the whole spectrum.” No other scientific journal covers both the breadth and depth of PLOS ONE’s subject matter, a state of affairs that leads its taxonomy curators into all kinds of traps and contradictions.

That wasn’t always appreciated. When Drysdale was hired to manage the PLOS taxonomy in 2012, the thesaurus was going through a complete overhaul. The editors had realized that its current 3,000 terms were inadequate to the task of helping readers find relevant information amid the roughly 2,000 new articles published every month. Articles at that time were tagged by their own authors, who knew their work best but were barely acquainted with the PLOS thesaurus. Despite the limited size of the thesaurus, around a third of the terms in it had never been used – while others were so broad they could tag up to 5% of the entire published corpus, making them all but useless for searches.

In 2012, the Public Library of Science contracted with the database construction firm Access Innovations (AI) to review the taxonomy and recommend changes. AI’s engineers methodically tallied the usage of each term, flagged articles that were under-categorized, and added thousands more terms from the company’s own scientific dictionary that seemed to close gaps in coverage. PLOS and AI representatives met every week to discuss topics like which terms to add and jettison, how many nesting levels should exist in the hierarchy, and which subjects should make up the taxonomy’s highest tier.

