It turns out that straddling both techniques can give us the best of both worlds. We built an algorithm inspired by a technique, Collaborative Topic Modeling (CTM), that (1) models content, (2) adjusts this model by viewing signals from readers, (3) models reader preference and (4) makes recommendations by similarity between preference and content.

Overview Our algorithm starts by modeling each article as a mixture of the topics it addresses. We can think of a topic as an unobserved theme, like Politics or Environment, that affects the words observed in the article. For example, if an article is about the environment, we’d expect words like “tree” or “conservation.”

We model each reader based on their topic preferences. We can then recommend articles based on how closely their topics match a reader’s preferred topics.


There are further questions for us to answer. Can this topic space capture ambiguous word usage? And how do we best observe the preferences of our readers? Clicks are, after all, not robust: I’m sure that at some point, you have clicked on something you didn’t enjoy and missed something you would have found interesting.

We tested many options carefully; the algorithm we built brings us closer to addressing some of these questions, and gives us a powerful new way to understand The New York Times.

This is a three-part challenge:

Part 1: How to model an article based on its text. Part 2: How to update the model based on audience reading patterns. Part 3: How to describe readers based on their reading history.

« Collaborative topic modeling applied to the new york times »

A quote saved on Aug. 12, 2015.


Top related keywords - double-click to view: