Jonathan Maletic and Andrian Marcus estimate that about 5 percent or more of the information present in manually created databases is erroneous. 5 Many possible causes for errors exist. First, some errors are due to interpretation discrepancies: different people who en ter data in a single database may have different interpretations of what type of information to enter in particular cells. This tends to hold true for many cultural heritage databases, where the database structure is typically created by the curators or researchers themselves, rather than by professional data managers. Consequently, such databases are often subject to limited quality control: that is, there are no strict (or enforced) guidelines of what information should go in different database cells or how the information should be represented or formatted. Even when the intended database structure is adhered to, database records may be corrupted by typos and copy-andpaste errors, or through optical character recognition errors if the digitization process of the source text was automatic. Also, when a database has evolved over time, the naming conventions may have changed, as often happens in zoological taxonomies, rendering some information outdated.

« Errors in cultural heritage databases »

A quote saved on Feb. 26, 2013.


