Skip to main content

FAIR Guidelines Set the Tone for Data Accessibility and Reusability

AI technology brain background digital transformation concept

Researchers from the National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign collaborate with various institutions across the country to make data exchange and artificial intelligence tools more FAIR – findable, accessible, interoperable and reusable.

This mission is the driving force behind NCSA’s FAIR4HEP (FAIR for high energy physics) project, funded by the U.S. Department of Energy. UIUC Physics Professor Mark Neubauer, NCSA Chief Scientist Daniel S. Katz, NCSA Center for AI Innovation Director Volodymyr Kindratenko, and Electrical and Computer Engineering Professor Zhizhen Zhao lead the Illinois team on this project.

“Modern research advances when researchers can access, verify and build on the work of others. In our role as an interdisciplinary research institute, NCSA wants to promote the FAIR principles for the outputs we create so that others can advance more quickly,” says Daniel S. Katz, NCSA chief scientist, FAIR4HEP co-PI. “We’re excited to apply our information science, data science and machine learning experiences to help explore how to best apply these principles in the context of high-energy physics. This work is a step along this path.”

NCSA’s interdisciplinary team collaborated with researchers from Argonne National Laboratory, University of Minnesota, Massachusetts Institute of Technology, and University of California San Diego. Together, they utilized high-energy physics and AI to develop a set of guidelines that guide and support discovery and innovation by improving data management and stewarding best practices. They recently published a paper in Nature Scientific Data about a FAIR dataset and the guidelines on how they made it FAIR, which others can use for their own data.

While the research published in Nature Scientific Data focused on FAIR aspects of a specific simulated dataset in Higgs physics, the goal was to provide an assessment guide applicable to a broad range of domain science for evaluating the degree to which a given data product meets the standards of the FAIR principles. The next phase of our research is to explore FAIR for AI models and develop community tools to better understand connections between AI models and the data upon which they depend.

Mark Neubauer, UIUC physics professor, FAIR4HEP PI

FAIR is a complex dataset from the CMS collaboration at CERN’s Large Hadron Collider containing Higgs boson decays. In addition to publishing the dataset and the steps used to make it FAIR, the team also evaluated the dataset’s FAIRness, using assessment tools and incorporating feedback from FAIR community members to validate their results. 

“This work outlines the steps needed to ensure that data collected by modern scientific instruments are reusable by the broad research community, ” says Volodymyr Kindratenko, NCSA CAII director, FAIR4HEP co-PI. “Data producers must make conscious efforts to make data FAIR. These practices will increase the accessibility and reliability of data in the scientific research community – not only when the actual data are acquired, but also in the years to come.”

Aptly titled “A FAIR and AI-ready Higgs boson decay dataset,” this paper is the first article in a planned series that will next guide scientists in creating FAIR AI models for high-energy particle physics.

Read more about this collaboration in Argonne National Lab’s press release.

Back to top