Skip to main content

Delta Helps Promote Open Science in Pre-training Speech Models

An image of Delta, NCSA's GPU based supercomputer. Delta is spelled out on the supercomputer in colorful geometric shapes reminiscent of a sunset over the water.

The general public often isn’t aware of major research advancements or scientific breakthroughs until the finish line. To most people, that’s usually when something becomes newsworthy – when results are finally official.

But there’s a whole race of research to be run before those results break through the tape. The open science philosophy is trying to make that race shorter and easier to navigate. By making their results and methods public, researchers are hoping others can build upon the foundational work of those who came before.

The academic community and scientific industry can still be competitive, however. Research teams from Carnegie Mellon University (CMU), Shanghai Jiao Tong University in China and the Honda Research Institute in Japan are setting an example for open science believers in their work pre-training speech models.

Principal Investigator Shinji Watanabe and researchers used the National Center for Supercomputing Applications supercomputing system Delta – the most performant GPU-computing resource in the National Science Foundation’s portfolio – to reproduce the development methodology of Whisper, an automatic speech recognition system trained on nearly 700,000 hours of multilingual- and multitask-supervised data collected from the web. Whisper was developed and maintained by OpenAI and the full scope of development for its models – from data collection to training – is not publicly available, making it difficult for researchers to further improve its performance and address training-related issues such as efficiency, robustness, fairness and bias.

Through the NSF-funded program ACCESS, Watanabe and researchers were able to use the power and speed of Delta to manage their data, create their models and train them in faster and more efficient ways.

While research on large-scale pre-training has exclusively been done by big tech companies, Delta is helping change this paradigm. Thanks to the generous resources and support provided by Delta, researchers from academia now have the capability to train state-of-the-art models at an industry scale. Notably, our open Whisper-style model (OWSM) stands out as the first large-scale speech model developed by the academic community.

Shinji Watanabe, associate professor at Carnegie Mellon University’s Language Technology Institute

“The sizing and composition of the computational and storage resources on Delta allow researchers in AI/ML to quickly train new models and make them available to the academic community,” said Greg Bauer, a senior technical program manager at NCSA.

The paper, appropriately titled “Reproducing Whisper-style training using an open-source toolkit and publicly available data,” will be published in December at the 2023 Institute of Electrical and Electronics Engineers Automatic Speech Recognition and Understanding Workshop (ASRU 2023). The teams have already released all scripts used for data preparation, training, inference and scoring, as well as pre-trained models and training logs in following open science philosophy, which will promote transparency and facilitate further advancements in the large-scale pre-training of speech models. Find it all in this end-to-end speech processing toolkit, ESPnet.

It’s not just the end result that is important. Learning from the journey taken to get there is also key and what open science encapsulates.

“Open source with accessible data is an essential component of scientific research,” Watanabe said. “It connects researchers, contributes to the community and makes AI technologies transparent.”

Back to top