New capabilities make data on Blue Waters shareable and movable

12.02.14 -

By Nicole Gaynor

At the 2014 Blue Waters Symposium and in other conversations, the science and engineering partners identified the ability to share their data related to massive supercomputing projects with the broader scientific community as an important service for the future. For the Blue Waters project, the future is arriving quickly with the prototype Blue Waters Data Sharing Service (DSS).

The prototype DSS is now available for Blue Waters users to share data sets with their research community or the broader public.

There are two classes of sharing based on the needs of the partners and data: active data sharing for projects with current allocations on Blue Waters and a community sharing plan for data produced by prior projects. Users from each group can share data using Globus Online's sharing capabilities or a web service interface. Each interface has specific requirements that determine which should be used, and each class of sharing also has unique requirements.

“Projects (PIs) can submit a service request, which is really just a means for us to help the teams better prepare their data for distribution,” says Jason Alt, one of the programmers at NCSA (along with Mark Klein) who implemented the DSS.

The service will allow supercomputer users to share their research data with colleagues who do not have access to the supercomputer. Former users can also share their data after their allocation ends; in this case the data owner is required to obtain a Data Object Identifier (DOI) for the data set.

Current and former partners will have two options for sharing data: Globus Online for large data sets (larger than about 4 GB and/or more than 100 files) and through a web browser for smaller data sets. Globus Online also allows researchers to limit who accesses their data, whereas web access means anyone can access it. Through the web interface, science teams can also create pages that explain the data, show results, or share a number of other useful bits of information just like on any other web page.

Both of these methods are read only and require that the owner of the data provide documentation and metadata to make the data more useful and self-contained. The shared data also counts toward the science team’s storage limit on the Blue Waters sub-storage systems.

Please see https://bluewaters.ncsa.illinois.edu/data-sharing for more information and to start sharing Blue Waters-generated data sets.

National Science Foundation

Blue Waters is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.