Skip to main content

Driven by Impact


A doctor holding a glowing DNA strand in her hands. The background has images of cancer cells and circuitry. Image meant to convey the collaboration between HPC and medicine

In 2010, Edison Liu, MD, director of the Genome Institute of Singapore, took the long trek to Champaign-Urbana for a sabbatical at the University of Illinois’ National Center for Supercomputing Applications (NCSA).

At the time, Liu, a renowned genomic researcher and cancer expert, and his colleague the late Frank Prendergast, MD/Ph.D., then-director of the Mayo Clinic’s Center for Individualized Medicine (CIM), struggled with a shared problem: how to communicate a deluge of genomic data into relevant, actionable information to guide clinical decision-making and improve healthcare. Lui, Prendergast, and NCSA design researchers Colleen Bushell and Lisa Gatzke began a project to create a clinical report that relied on data visualization and information design to explain a patient’s cancer mutation status to oncologists.

“It was a short-term project and one of the first that was funded by the newly formed Mayo Clinic and Illinois Alliance for Technology-Based Healthcare,” recalled Bushell, now director of NCSA’s Healthcare Innovation program office. “And it led to more than a decade of significant work between Mayo and NCSA.”

The Healthcare Innovation Program Office, later established in 2019, coordinates NCSA collaborations in healthcare, including those with researchers at Mayo Clinic. As a result of that first project, NCSA’s Visual Analytics group began working with Mayo Clinic’s Department of Laboratory Medicine and Pathology to develop a visual method of presenting a comprehensive diagnostic panel of 17 hereditary colon cancer genes. That larger second project was a success, and Bushell presented the design and approach at Mayo Clinic’s first Individualizing Medicine Conference in 2012.

Colleen bushell headshot

All the expertise NCSA has, everything we are known for, is valuable to Mayo’s mission. We are thrilled to contribute to their vision and apply our skills to their data and analysis challenges.

–Colleen Bushell, director, NCSA’s Healthcare Innovation Program Office

But something much larger in scope than these genomic reports had been created. The projects birthed a persisting and synergistic relationship between two institutions that provide benefits to both and offer a path for advancing healthcare through data science and computation. The activity is part of the broader Mayo Clinic and Illinois Alliance that provides a framework for connecting University of Illinois faculty, students and engineers with collaborators at Mayo Clinic. The Alliance is still active today, led by the university’s Interdisciplinary Health Sciences Institute. For NCSA, the Alliance enables a wealth of opportunities to advance technologies and contribute to biomedical discoveries that ultimately improve healthcare.

“All the expertise NCSA has, everything we are known for, is valuable to Mayo’s mission. We are thrilled to contribute to their vision and apply our skills to their data and analysis challenges,” said Bushell. “Our early work quickly expanded to engage experts from every directorate at NCSA.”

Bushell, who, along with NCSA Director William Gropp, serves on the Alliance’s Executive Committee, said NCSA not only collaborates directly with Mayo Clinic colleagues but also supports research efforts between UIUC faculty and Mayo Clinic by providing computation and data science expertise when needed. Mat Wiepert, the senior manager for Mayo Clinic’s Information Technology genomics activities, and also a member of the Alliance’s Executive Committee, has been a champion for NCSA and continues to be instrumental in establishing successful projects between NCSA and Mayo Clinic. Wiepert and Bushell serve as co-chairs for the Bioinformatics and Computation Sub-committee of the Alliance and are exploring ways to expand student engagement.

Lisa Gatzke, User Interface and User Experience (UI/UX) design team lead at NCSA, has worked on a variety of projects with Mayo over the years and said the tools have become easier to develop as people from the two organizations have become more familiar with each other and built mutual trust.

“With Mayo, everything is very data-intensive,” said Gatzke. “We spend a lot of time figuring out how to make software intuitive and we use visual design to support a hierarchy of importance for presenting information. If you don’t have designers to think it through, the tendency is to put everything on the screen at once.”

A Mutually Beneficial Relationship

NCSA’s work with Mayo Clinic falls into five general categories: developing advanced analytical approaches, AI and bioinformatics workflows that help biologists make discoveries; developing innovative software and user interfaces that leverage interactive visualization to help users interpret their data; creating advanced cyberinfrastructure for computing, storing and moving massive amounts of biomedical data; and developing data standards, cloud services and workflows to combine complex data into new data repositories; and experimenting with new hardware and optimization methods. All of these are geared toward achieving the vision of data-enabled healthcare.

“The relationship helps NCSA as well as Mayo,” Bushell said. “We need real use cases to drive the development of new systems and approaches, and they get exposed to the most advanced data science research and computing technologies. Together, we push technology for healthcare forward.”

Gatzke agreed, saying that her work with Mayo Clinic is a collaborative, iterative process.

“Our UI/UX team’s work focuses on the user experience,” she said. “We talk, we study their data, we learn about their needs, and once we have an understanding, we begin design. We work closely with NCSA domain experts in genomics and medicine to develop solutions and interact regularly with Mayo users. We create a clear logic in an interface. We group functionality and use principles of design to reinforce the hierarchy we’ve established.

“It’s a unique direction to take your design career,” she added.

Improvements such as these have a significant impact on the amount of time patients wait for important results and the number of patients that can be served.

–Nate Mattson, principal software engineer, Mayo Clinic

Working with genomic data has many challenges. Understanding mutations and finding genetic markers for disease requires aligning genomic sequences to a reference genome and identifying variants. The process of alignment and variant annotation takes considerable computational time. To address this challenge, the Alliance proposed a “grand challenge” to bring compute-intensive research to clinical practice and speed up the pipeline. Nate Mattson, a principal software engineer at Mayo Clinic, described the benefits of this collaboration at the NCSA Industry Program conference in 2019.

“Before working with NCSA, one such scenario called whole genome trio analysis required 20 days of computation on a high-performance compute cluster,” he said at the event. “Now, after solving the grand challenge, the process takes about half a day on that same system. Improvements such as these have a significant impact on the amount of time patients wait for important results and the number of patients that can be served.”

Since the successful completion of the grand challenge, Christina Fliege, technical program manager and her NCSA Genomics team continue to work with Mattson on other computational and workflow challenges.

Translating Research and Workflows Into Clinical Settings

One of the barriers faced by researchers studying large, complex data is the effort and knowledge required to set up the data, tune the parameters for analysis, run and monitor the computation, and iteratively adjust and rerun the computation based on the results. This is especially true with biomedical data. NCSA has worked with Mayo Clinic to develop intuitive workflows for clinicians and biologists, allowing them to spend more time studying their results and less time managing the technical processes of their data experiments. Documented workflows also make it easier for other researchers to use the same approach and facilitate reproducibility.

“Building analysis workflows is extremely important in bioinformatics,” said Jessica Saw, MD/Ph.D., a research scientist in NCSA’s Visual Analytics Group. “Usually, the researcher needs to download many different applications for different functions and an output from one application will become an input for another. We design workflows that combine all the analysis steps, from making sure the data is clean to ensuring accuracy all the way through to the final analysis and visualization.”

For example, an effort led by Charles Blatti, Ph.D., an NCSA senior research scientist in the Visual Analytics group, developed a framework to help UIUC and Mayo Clinic researchers run their analytical models and study results using novel visualizations. The research aims to understand tumor phylogeny or how cancer cell mutations evolve over time. Knowledge about tumor evolution helps clinicians make informed decisions about the most appropriate therapies for a patient’s unique mutation trajectory.

Jessica Saw's portrait

We design workflows that combine all the analysis steps, from making sure the data is clean to ensuring accuracy all the way through to the final analysis and visualization.

–Jessica Saw, NCSA research scientist

Mohammed El-Kebir, Ph.D., an assistant professor of computer science at UIUC, and Nicholas Chia, Ph.D., a former biophysicist at Mayo Clinic’s CIM, collaborated on the development of the analytical and computational approach using colorectal cancer data. The scientific team included UIUC co-PI Zeynep Madak-Erdogan, Ph.D., an associate professor of food science and human nutrition whose research includes metastatic breast cancer. Gatzke led the UI/UX effort and Matt Berry, Visual Analytics team lead and senior research software engineer, led the implementation of the framework. “This type of multi-disciplinary collaboration is typical at NCSA and illustrates one of the unique approaches we bring to the table,” explains Bushell.

The framework includes two components: PhyloFlow combines proven open-source applications into one easy-to-use tool that provides end-to-end cancer phylogenetic analysis and the ability to run an analysis in the cloud. PhyloDiver takes the data from PhyloFlow and creates visualizations that users can interact with in real-time. Instead of a static report, the visualization tool allows users to explore the uncovered mutational trajectories with novel visual cues relating to the prevalence of the tumor cell subpopulations in different samples and with variant summaries that highlight their most important impacts on proteins and drug responses.

Even though the results of this research aren’t ready for standard care, the team thinks ahead to how this could be used in the clinical setting in the future. “We want to be sure that if a research tool eventually becomes useful in the clinic, we won’t need to rebuild it to accommodate all the technical and user requirements in that scenario,” explains Bushell. “We want to make decisions now that will make the transition easier and more cost-effective in the future.”

Computing Power for Medical Discovery

NCSA has long been the home to some of the world’s most powerful high-performance computers and Mayo Clinic researchers have been tapping into those resources for about a decade, beginning with proton beam research using NCSA’s Blue Waters supercomputer. Mayo Clinic researchers used the intense computing environment to analyze MRI scans of cancerous tissue so that radiation could be applied more precisely with less impact on surrounding healthy tissue. Their research led to new and better clinical practices within three years.

In 2014, after executing several projects on NCSA compute systems, Mayo Clinic contracted NCSA to architect and host an advanced system for their exclusive use called mForge. After significant work between Mayo Clinic and NCSA security experts, mForge eventually obtained certification for Health Insurance Portability and Accountability Act (HIPAA) compliance, making it possible to expand its use to include sensitive patient data approved by Mayo’s Institutional Review Board (IRB). Mayo Clinic and NCSA continue a close collaboration managing and running mForge. Doug Fein, the NCSA operations and technical lead for mForge, estimated that hundreds of Mayo Clinic projects have used the cluster over the years, and the system has grown to include 170 compute nodes, more than 20 petabytes of storage, and 512 gigabytes of RAM per core.

[T]he goal is always to accelerate research and translate that research into clinical practice as quickly as possible.

–Doug Fein, technical lead for mForge

Headshot of Doug Fein

“Most of the projects involve working with genomic data in different ways,” said Fein. “mForge is a tool and my job is to make the tool work. Their needs change over time depending on the research they are doing, but the goal is always to accelerate research and translate that research into clinical practice as quickly as possible.”

Although mForge is a tool, Fein said the success of his work with Mayo Clinic reflects the quality of relationships built between professionals at the two organizations over the years. Because NCSA technologists have been working with their Mayo Clinic counterparts for so many years, it’s relatively easy for both organizations to talk and strategize about current issues and future plans, bringing complementary knowledge and perspectives.

“This doesn’t work without a relationship,” said Fein. “We understand how they work, and they understand what we can bring to the table. We are better able to meet their research needs, and that’s our goal at NCSA: to make research easier.”

Leveraging the Rapidly Growing Data

As genetic sequencing tests have become the standard of care for the diagnosis and management of many cancers and inherited diseases, Mayo Clinic faces the challenge of finding analysis tools that can scale up to the demands of huge datasets and complex inquiries.

“Mayo is generating data they want to understand better, and they don’t have all the tools they need to do that,” said Blatti, who has worked on five projects with Mayo Clinic. “That’s where our team steps in. We learn about their data, their needs, and how we can create a software tool and interface that helps them answer questions from these new datasets.”

In 2020, Blatti led a project to create Coverage Utilities, a specialized tool for evaluating the depth of sequence coverage in regions of interest from the results of a single sample genome sequencing test. Low coverage areas (areas where the average number of reads that cover known reference bases is low) affect the degree of confidence that researchers and geneticists have in their data. Before analysis can be done, the expert needs to determine if the low-coverage areas need to be examined more closely or if they simply represent noise that can be disregarded.

“Mayo is generating data they want to understand better, and they don’t have all the tools they need to do that. That’s where our team steps in. We learn about their data, their needs, and how we can create a software tool and interface that helps them answer questions from these new datasets.

–Charles Blatti, NCSA senior research scientist

Blatti, Gatzke, and Matt Berry, Visual Analytics team lead and senior research software engineer, worked closely with Mayo Clinic developers to incorporate the tool into Mayo’s workflow for clinical interpretation of genetic test results. This quality assurance tool provides a complex yet intuitive visual interface where users can efficiently review coverage data and flag low coverage areas for follow-up. According to Blatti, the ongoing long-term relationship with Mayo Clinic helped NCSA create and deploy Coverage Utilities into their clinical environment in about a year.

“Leveraging the relationship from previous projects always helps new projects get started and rolling sooner,” he said. “Understanding the stakeholder needs, the structure of Mayo, and how we’re fitting into the broader picture helps us get started and ask the right questions right off the bat.”

Berry agrees, “On each project, NCSA and Mayo Clinic work together very closely, as if we were all part of the same organization. It takes that kind of collaboration to arrive at the right solution.”

NCSA is contributing to the development of Mayo Clinic’s Digital Omics Platform led by Eric Klee, Ph.D., director of bioinformatics in CIM. The platform serves as the central institutional repository for omics data and enables research and clinical workflows by providing infrastructure, AI tools and knowledge services.

Not only are we contributing to the software and UI/UX, but we’re also combining our genomics expertise with our long history of database architecture and data engineering experience to help create cloud services, interfaces and workflows needed to facilitate the flow of genetic information from lab to clinician. “Bringing our expertise to the Digital Omics team has been so rewarding, knowing we are making a difference for both patient care and genomic research,” says Chris Pond, lead of NCSA’s Research Data Engineering group.

Looking to the Future

One current initiative in the NCSA/Mayo Clinic relationship aims to explore whether a new CS-2 supercomputer called HOLL-I can help Mayo Clinic in its never-ending quest to learn more from genomic data. Built by Cerebras and introduced in 2021, the system uses the Wafer-Scale Engine (WSE) and is one of the fastest systems available for AI and machine learning. NCSA plans to fund efforts with Mayo Clinic to determine how much faster the new system can run AI and ML models.

“People are always trying to find ways to work with these incredibly large datasets,” said Katherine Kendig, the NCSA program manager who manages mForge and introduced Mayo Clinic collaborators to HOLL-I. “Training large AI models requires huge amounts of data and many parameters. You would need tens or hundreds of GPUs to do what this one CS-2 can do.”

NCSA will evaluate the ability of HOLL-I to train large language models (LLMs), a type of AI algorithm that uses deep learning techniques and massively large datasets to understand, summarize, generate, and predict outcomes in data. Once an LLM is trained, it can be a tool for many purposes, such as classifying DNA variants as benign or suspicious. HOLL-I can also be used effectively for analyzing medical images, said Kendig.

When Mayo is interested in different kinds of computing to achieve their vision of healthcare, our long-term collaboration makes it easy for them to push the envelope at NCSA.

–Katherine Kendig, NCSA program manager

Katherine Kendig headshot

NCSA’s deployment of HOLL-I was motivated by the needs of other NCSA biomedical partners and when Mayo Clinic collaborators learned about the system, they were immediately interested in trying it, Kendig added. As NCSA works to integrate software into HOLL-I, researchers and technologists also keep an eye on the future of computing. 

“When Mayo is interested in different kinds of computing to achieve their vision of healthcare, our long-term collaboration makes it easy for them to push the envelope at NCSA,” said Kendig.

“One of the reasons we have worked together for so long is because we have common values and goals,” said Bushell, who anticipates the relationship between NCSA and Mayo will continue beyond her career. “Our intention is to continually improve human health – whether that is through research and discovery, through development of new tools, or by improving communication and understanding. Our goal is not profit, but instead, we focus on impact.”

“Having a partner to help us achieve our goals has been instrumental in the gains we have made. Our NCSA colleagues bring unique skills to our efforts while integrating into our team projects seamlessly,” explains Wiepert, “Finding a true collaborator like this is a very rare thing. This isn’t about a transactional experience but both teams coming together to help solve problems that make a real difference for Mayo Clinic research and practice initiatives.”

“We combine our unique expertise to solve the problems that we can address now and to look into the future together so we can continually push the domains of data science, visualization, and computation forward,” says Bushell. “We want to contribute to Mayo’s vision for the best healthcare possible.”

Back to top