Whole Tale enables new discovery by bringing ‘life’ to research articles

06.08.16 -

Directions for a new piece of "some assembly required" furniture are only useful if the user has the parts listed in the instruction manual. That makes putting those coffee tables and bookcases relatively easy to put together, compared to designing and constructing your own from scratch.

Scientists at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign are hoping to do the same thing with computer code. "Whole Tale," a new, five-year, $5 million National Science Foundation-funded Data Infrastructure Building Blocks' (DIBBs) project, aims to give researchers the same instructions and ingredients to help ensure reproducibility and pave the way for new discoveries.

Whole Tale will enable researchers to examine, transform, and then seamlessly republish research data, creating "living articles" that will enable new discovery by allowing researchers to construct representations and syntheses of data.

"It's almost expected nowadays that when you publish the paper you link the paper to data," explains co-PI Matthew Turk, a research scientist at NCSA. "Linking papers to code as well as data is becoming more common. Whole Tale will take that a step further and let other researchers to replicate the experience of doing the research but in their own way."

"Whole Tale" alludes to both the "whole publication story" and the "long tail of science." The project will create methods and tools for scientists to link executable code, data, and other information directly to online scholarly publications, whether the resources used are small-scale computation or state-of-the-art high-performance computing.

"Whole Tale will support the full lifecycle of computational science, from writing code to conducting the experiment to publishing the results, by creating new methods that bring together existing tools and make them easier to use," says Principal Investigator Bertram Ludäscher, a professor in the School of Information Sciences and an NCSA researcher.

How will Whole Tale work? Through a web-browser, a scientist will be able to seamlessly access research data and carry out analyses in the Whole Tale environment. Digital research objects, such as code, scripts, or data produced during the research, can be shared between collaborators. These will be bundled with the paper to produce a "living article," accessible by reviewers and the scientific community for in-depth pre- and post-publication peer review. Augmenting the traditional research publication with the full computational environment will enable discovery of underlying data and code, facilitating reproducibility and reuse. Whole Tale will provide an environment of multiple, independently developed frontends (e.g., Jupyter, RStudio, or Shiny) where data can be explored in myriad ways to yield better opportunities for understanding, use, and reuse of the data.

Researchers envision a three-part process:

  1. Prepublication and the Research Environment: The Jupyter project provides a powerful and popular frontend for research, storing real-time information about the research pipeline during the research process. A federation of data repositories will be accessible uniformly through DataONE, and tools such as Globus data publication services will allow a group of researchers to share in the creation of datasets and metadata. Providing a personal, federated storage system for intermediate products using ownCloud and iRODS enables collaboration to take place nearest the data itself. Integrated tools such as BrownDog provide support in creating the appropriate metadata.
  2. Publication Process, Peer Review, and Embedded Articles: Whole Tale will allow scientists to organize data, software, and workflows into curatable collections with assigned digital object identifiers (DOIs) as the paper is written and these collections will be ready for publication with the paper. The PIs are working with key publishers, including BioOne’s Elementa: Science of the Anthropocene, on the tools and services needed to realize the project.
  3. Postpublication Access, Persistence, Reproducibility, and Reuse: Whole Tale will collaborate with initiatives developing new data-literature linking services. In this way, users can independently verify the published findings, as well as have the option to execute the codes on a different system.

Whole Tale is a highly collaborative project led by Bertram Ludäscher, along with co-PIs Matt Turk and Victoria Stodden, also a member of the School of Information Sciences faculty and an NCSA researcher. Additional PIs include Kyle Chard, senior researcher and fellow in the Computation Institute at the University of Chicago and Argonne National Laboratory; Niall Gaffney, director of data intensive computing at the Texas Advanced Computing Center; Matt Jones, director of informatics research and development at the National Center for Ecological Analysis and Synthesis at University of California, Santa Barbara; and Jarek Nabrzyski, director of the Center for Research Computing at the University of Notre Dame.

Follow the development of this five-year project at: http://wholetale.org/.