Building a Non-Consumptive Global News Observatory for Data Science Research
College: Liberal Arts and Sciences
Award year: 2014-2015
News may be the “rough draft of history,” but the historical record of this rough draft remains largely unavailable to data science researchers using text mining methods. Reasonable concerns about protecting copyright holders and maintaining proprietary control over textual corpora place important restrictions on data science researchers interested in analyzing textual news content at extreme scales. The Cline Center for Democracy at the University of Illinois has compiled a global news archive containing over 110 million full-text news articles published from 1941 to the present, but legal restrictions against distributing copyrighted text to third parties limit the Cline Center’s ability to provide scholarly access to its full-text holdings. However, a relatively new approach to research on copyrighted materials offers a way to provide full-text access to data science researchers without violating copyright or requiring database distributors to forfeit proprietary control over their investments. Known as “non-consumptive research,” this new paradigm allows researchers to run algorithms on copyrighted full-text holdings without allowing them to see or copy those holdings. This project will prototype a non-consumptive research platform for text-mining the Center’s full-text news holdings.