Skip to main content

New in ’22: Three New Supercomputing Resources and More


An exterior photo of NCSA's National Petascale Computing Facility

NCSA's National Petascale Computing Facility

Continuing the National Center for Supercomputing Application’s work at the forefront of supercomputing innovation, the Center has several new systems coming online in 2022. Each represents a significant advance, targeting the specific needs of researchers, academics and corporate interests. Additionally, each provides specialization and cost-effectiveness in compute time and overall resource economy. They are Delta, Nightingale and HOLL-I.

Delta

The completed Delta system at NPCF.

With 2021’s retirement of NCSA’s powerhouse, Blue Waters, 2022 offers the launch of an all-new supercomputer housed at the University of Illinois Urbana-Champaign’s National Petascale Computing Facility. Announced in 2021, Delta’s early user period begins soon, and we are looking forward to National Science Foundation acceptance. The first allocation period for Delta has passed, but allocations continue on schedule through XSEDE and the Illinois Allocation Request process. Delta offers a dedicated GPU (graphics processing unit) supercomputing experience. If you’ve been using only CPU and thinking that the GPU system architecture might be a more efficient data analysis methodology, the Delta team can help you make that transition.

Details and Description

  • 124 CPU nodes consisting of:
    • Dual AMD 64-core 2.55 GHz Milan processors
    • 256 GB DDR4-3200 RAM
    • 800 GB NVMe solid-state disk
  • 100 quad A100 GPU nodes consisting of:
    • Single AMD 64-core 2.55 GHz Milan processor
    • 256 GB DDR4-3200 RAM
    • 1.6 TB NVMe solid-state disk
    • 4 NVIDIA A100 GPUs with 40 GB HBM2 RAM and NVLink
  • 100 quad A40 GPU nodes consisting of:
    • Single AMD 64-core 2.55 GHz Milan processor
    • 256 GB DDR4-3200 RAM
    • 1.6 TB NVMe solid-state disk
    • 4 NVIDIA A40 GPUs with 48 GB GDDR6 RAM
  • Five eight-way A100 GPU nodes consisting of:
    • Dual AMD 64-core 2.55 GHz Milan processors
    • 2 TB DDR4-3200 RAM
    • 1.6 TB NVMe solid-state disk
    • 8 NVIDIA A100 GPUs with 40 GB HBM2 RAM and NVLink
  • One MI100 GPU node consisting of:
    • Dual AMD 64-core 2.55 GHz Milan processors
    • 2 TB DDR4-3200 RAM
    • 1.6 TB NVMe solid-state disk
    • 8 AMD MI100 GPUs with 32 GB HBM2 RAM
  • 8 utility nodes will provide login access, data transfer capability and other services
  • 100 Gb/s HPE SlingShot network fabric
  • 7 PB of disk-based Lustre storage
  • 3 PB of flash based storage for data intensive workloads to be deployed spring 2022

Nightingale

Stylized visualization of DNA strands in hues of blue and green.
DNA strands.

NCSA is launching an exceptionally secure and user-friendly compute resource for analyzing sensitive data. Nightingale is a powerful HPC cluster that leverages capabilities provided to our clinical partners. Its massive storage and GPU/CPU infrastructure is Soc2-Type2-certified which means it’s compliant with the HIPAA privacy and security rules for using Protected Health Information and satisfies security policies for other controlled data such as FERPA, CUI and PII. NCSA’s professionals in Cybersecurity, Healthcare Innovation and User Services manage the complex security requirements and guide research teams through all of their compliance questions and onboarding steps, taking the burden from Nightingale users so they can focus on their research.

Details and Description

  • Interactive Compute Nodes
    • 4 interactive compute/login nodes with dual 64-core AMDs and 512 GB of RAM 
    • 6 interactive nodes with 1 A100, dual 32-core AMDs with 256GB RAM 
    • 5 interactive nodes with 1 A40 with dual 32-core AMDs and 512GB RAM
  • Batch Compute System
    • 16 dual 64-core AMD systems with 1 TB of RAM 
    • 2 dual-A100 compute nodes with 32-core AMDs and 512 GB of RAM
  • Storage
    • 880 TB of high-speed parallel LUSTRE-based storage

 

HOLL-I

Photograph of cable and electrical management at the National Petascale Computing Facility. Cables are seen running along the ceiling infrastructure in a neat and orderly fashion.
Cable management at NPCF.

HOLL-I (Highly Optimized Logical Learning Instrument) is designed to handle large-scale AI and machine-learning tasks. Paired with NCSA’s common storage system, TAIGA, HOLL-I users can enter and run jobs through an XRAS accelerator with Tensorflow and PyTorch frameworks. HOLL-I’s advantage as a dedicated resource for AI-related algorithms is unique in its Cerebras Systems CS-2 accelerator and systemic infrastructure. HOLL-I’s targeting offers higher-speed processing and is an economic choice for obtaining limited compute resources without allocation on other systems like HAL or Delta. Though similar to systems developed by other supercomputing centers, HOLL-I is unique in the AI computing space, putting NCSA at the forefront of artificial intelligence research.

Details and Description

  • A single login node and 9 worker nodes that package use-jobs for communication and computation with the CS-2 unit over a high-speed private network.
    • The CS-2 accelerator has over 850K cores
    • 40GB/s of on-chip ram (at 20PB/s BandWidth) 
    • 1200 Gb/s of i/o network bandwidth
      • All of this exists on the same wafer of silicon
  • 192TB  local ZFS storage for scratch 
    • Access to TAIGA shared NCSA LUSTRE project space 
    • TAIGA storage for project data pursuant to TAIGA guidelines
  • 40 GB ram on accelerator, loading/dumping at 120GB/s

 

Clowder 2.0

Stylized image of the Clowder cat logo in orange on a faint white background with various motion lines subtly in the back.
Clowder logo.

Since 2010, Clowder, an open-source data management framework for research has been working steadily with researchers to create gateways for scientific data management. The success of Clowder has led to projects like the Permafrost Discovery Gateway, and partnerships with agricultural powerhouse Syngenta. At the time of Clowder’s creation, NCSA’s software team wrote the codebase in Scala, a programming language widely used to build distributed systems, capable of running on the highly performant Java Virtual Machine, and able to leverage the large Java ecosystem. For Clowder v2, the core of the framework has been rewritten in Python and Reactjs to reach a wider audience of contributors. This change will enable the Clowder team to keep the software more easily up to date, including its libraries and other dependencies.

iForge to vForge

Abstract photograph taken from the ground looking directly up into electrical infrastructure. Dark lines of cables and metal contrast with the light sky.
Abstract infrastructure.

Responding to the needs of an ever-changing Industry Partners group, in 2022, iForge will be migrating to virtual machines through the use of NCSA’s Radiant platform. Leaner and more targeted, vForge will continue to serve the same uses as iForge, while allowing NCSA to use on-site resources for upwards-scaling projects.  Plans for the data migration to TAIGA and vForge for existing iForge projects are being developed and minimal disruption is expected. Another competitive aspect of the shift to vForge allows data created and stored on TAIGA to be used more easily across NCSA clusters that have access to the storage system. Similar standards will apply for login and data transfer while the GUI using Open on Demand will remain. The new system should be in production by mid-to-late summer.

Back to top