An Implementation of Topic Modeling that Addresses Humanists' Interest in Historical Change

Ted Underwood

College: Liberal Arts and Sciences
Award year: 2012-2013

SEASR, developed at NCSA, is an internationally prominent environment for digital humanities research. We propose a long-term project that would make SEASR the leading service for topic modeling humanistic collections of historically significant size (> 1 billion words). In the first phase of this project, we will a) scale up a new topic modeling algorithm that addresses the historical dimension of humanistic research more effectively than the current standard and b) develop new services to help researchers visualize topic models. In a second phase, for which we will seek external funding, we will implement these facilities as a web service (supported by SEASR's Meandre infrastructure) that allows users to define and model ad-hoc subsets of large collections. In this way, we expect to promote the adoption of cutting-edge informatics techniques that could help humanists address theoretical problems central to their work.