Skip to main content

NCSA machine learning pipeline provides insight into energy-efficient home improvement programs

Homeowners often invest in energy-saving upgrades to make their homes more comfortable and lower their expenses, hoping to see reductions in their upcoming utility bills. Government-backed and utility-backed programs that provide energy-efficient home improvements share the same goal of reducing costs. But measuring the costs and effects of hundreds of different retrofits in thousands of households is a complex process, and the big picture — which changes should be prioritized for the biggest benefit to the resident — is difficult to put together. While energy efficiency programs have developed sophisticated models to improve decision making, documented disparities between predicted and realized savings demonstrate that there is still substantial progress that could be made.

At the University of Illinois at Urbana-Champaign’s National Center for Supercomputing Applications (NCSA), researchers are used to using compute power to dig for answers in piles of untamed data. NCSA Core Faculty member Peter Christensen, also an Assistant Professor of Agriculture and Consumer Economics, routinely tackles policy-driven research questions with his Big Data for Environmental Economics and Policy (BDEEP) team, investigating the ways that data science and economics can help inform, justify, and validate public policies, or in some cases, reveal underlying flaws. To support this work, Christensen’s team has developed a machine learning-driven analytics pipeline to handle data collection, pre-processing, and statistical modeling for a range of projects that involve high-dimensional data and continuous acquisition.

Over the past year, collaborators Christensen, Erica Myers and Mateus Souza of the College for Agricultural, Consumer and Environmental Sciences and Paul Francisco of the Indoor Climate Research & Training Institute have been using the BDEEP team’s pipeline and NCSA’s experimental data analytics compute cluster to analyze a large dataset of home performance measures to better understand the drivers of past savings and provide insights into how to optimize benefits in the future.

The project’s first goal was to quantify disparities between predicted and realized savings. Savings are typically predicted using energy modeling software that incorporates information about housing structure and the upgrades being performed. However, predictions from those models are often inaccurate, and with hundreds of program offerings (from adding insulation and updating furnaces to installing energy-efficient appliances and LED lightbulbs) performed in various combinations on a variety of home types, parsing out nuanced effects can be difficult. Using before-and-after utility bills, housing structure variables, and data for thousands of homes treated over the past decade, the researchers implemented a new machine learning method that can estimate and predict savings patterns that a human mind (or traditional engineering and statistical models) would be unable to identify.

Most of NCSA’s computing systems are a good fit for a wide variety of uses, but the NCSA analytics cluster the researchers used was specifically developed and optimized by NCSA Industry, Yifang Zhang of the NCSA Data Analytics team, and Christensen’s BDEEP group for this kind of problem: the multi-processor system runs SparkR, which enables users to employ respected but computationally intensive machine learning algorithms implemented in the R programming language while also capitalizing on the distributed memory capabilities of the Spark platform. Christensen says that SparkR was “essential” for running machine learning models at scale: “These models can take weeks or longer to run without the ability to distribute tasks and run them in parallel, which Spark enables. Our distributed models were running more than ten times faster than on a standard HPC cluster, a big enough difference to transform how we approached the problem.”

Mateus Souza, the PhD student who pioneered the implementation of machine learning on the platform, agrees: “NCSA’s analytics cluster gave me the flexibility to be more creative as I tested and refined our algorithms and model configurations.”

The researchers found that in many cases, current predictions were overestimating some benefits and underestimating others. On average, actual savings were falling short of predicted savings. Providing more accurate assessments of how energy-saving measures benefit particular homes can help policymakers, programs and homeowners to focus their efforts and return maximum energy savings from their investments.

Christensen predicts that machine learning models will soon be heavily used across industry and government for decision making, but recognizes that they’re not quite there yet, in part because model training can be very time-consuming (and sometimes, very expensive) without the right infrastructure. Part of the challenge for researchers will be bringing the benefits of efficient machine learning to collaborators outside of compute-forward environments like NCSA. But the team is ready for that challenge and others, and in addition to this project, has already used machine learning on NCSA’s analytics cluster for another project detecting racial discrimination in the US housing market.

Christensen says his group will continue to focus on addressing real-world public challenges with technological solutions: “We use these technologies to rethink economic and policy problems that have been around in our discipline for a long time, but haven’t been tractable. We want to solve that kind of problem.”

Disclaimer: Due to changes in website systems, we've adjusted archived content to fit the present-day site and the articles will not appear in their original published format. Formatting, header information, photographs and other illustrations are not available in archived articles.

Back to top