..information about the semantic relatedness of content can be used for a number of purposes, such as recommendation, navigation, duplicates or plagiarism detection. CORE estimates semantic relatedness between two textual fragments using the cosine similarity measure calculated on term frequency-‐inverse document frequency (tfidf ) vectors. Details of the similarity metric are provided in (Knoth, et al., 2010). Due to the size of the CORE dataset7 and consequently the high number of combinations, semantic similarity cannot be calculated for all document pairs in reasonable time. To make the solution scalable, CORE uses a number of heuristics, to decide which document pairs are unlikely to be similar and can therefore be discarded. This allows CORE to cut down the amount of combinations and to scale up the calculation to millions of documents. CORE supports the discovery of semantic relatedness between two texts held in the CORE aggregator. In addition, the system supports the recommendation of full-‐text documents related to a metadata record and the recommendation of a semantically related item held in the aggregator for an arbitrary resource on the web.



« Discovery of semantically related content »


A quote saved on Nov. 19, 2013.

#documents


Top related keywords - double-click to view: