Genomics researcher Edison Liu spends summer at NCSA

08.09.10 -

Edison Liu, the executive director of the Genome Institute of Singapore, has spent the summer at the National Center for Supercomputing Applications, working on collaborative projects and exploring the potential for genomics research and supercomputing to powerfully converge. He recently answered a few questions about his collaboration with the center.

What projects have you launched this summer with NCSA?

I'm working on several projects with Mike Welge [head of NCSA's Data-Intensive Technologies and Applications group], David Tcheng, and Jian Ma [an assistant professor of bioengineering]. One is to look at the binding site motifs of all transcription factors on the genome using comprehensive motif scanning techniques. The idea here is to develop quick algorithms to be able to annotate the entire genome based on probabilities of binding of these transcription factors.

The second is perhaps much larger and that is to work with both the Electrical and Computer Engineering Department and NCSA to accelerate a genome assembly. With the sequence data that we have, one of the biggest computational challenges is how to assemble the entire genome from short fragment reads. It's computationally intensive. The toolsets that are available here are actually quite unique, including looking at FPGAs and GPUs, the fundamental hardware of the systems.

The third part is about data visualization. Mike and I have a project about how to visualize this complex data in an understandable way. Again that's something in which NCSA has been really pioneering.

Can you also describe some of the more strategic discussions that are under way?

One is what I would call more general medical informatics. I'm working with Larry Schook and Jennifer Eardley [the director and associate director of Illinois' Division of Biomedical Science] in their conceptualization of biomedicine. A lot of that has to do with how genomics and computational biology will be integrated into the fabric of medical practice.

The second part is I've been involved with strategic discussions about how NCSA wants to move into biology and genomics. In the last three years, the rate and complexity of genetic and genomic data generation is what I call hyper-exponential. It's unprecedented. As this trend continues, the database will be so huge, the data will probably reside in one place, where people really know data management, and the applications will follow the data, rather than what is happening right now where data tends to be distributed in multiple places.

At the same time, centers like NCSA are seeking to broaden the applicability of their computational capabilities. I think institutions like NCSA, if there is a will here to pursue this, will be the centers for genomics in the future. I believe that the sequencing centers will gravitate toward the data centers, instead of vice versa.

I'm learning some of the details of what cutting-edge computer science can do and what high-performance computing can do for biology. And I think the people at NCSA are seeing what are the fundamental questions that are being asked in genomics and biology and how to frame the data in such a way that it can be analyzed with the capabilities that are here.

The nomenclature and the ways of tackling a problem are so different between the genomics community and the CS/HPC community. So half our time can be spent on just figuring out what the other means. It really requires people to be in the same room together. And thereafter, once you have that fundamental understanding, it becomes much easier to collaborate.

I think the most important message is that the time is right. The most fundamental issue in biology is really in the genes. And the genes are in the sequence and the sequence is genomics. So if we are able to harness the genomic data, then we really are attacking the fundamentals of biology. But never before have we been in the situation where we needed such computational capabilities. So this is a unique opportunity for both biomedicine and for groups like NCSA to take a great leap forward. Because once that happens it will fundamentally change biology, just as much as the personal computer has changed our conduct of research in a profound way. The same thing will happen once we engage the type of compute capabilities that are available here.

I've always been intrigued by the fact that the problem isn't necessarily in further expanding high-end capabilities, but in simply translating those capabilities to a community that has never experienced them or used them before. A lot of the software that is in a laptop, for example, is not rocket science, but its impact was to deliver computing power for the non-expert. And that has opened up a whole new vista of creativity that doesn't reside exclusively among computer experts. If we can make biologists into computational experts in their own right, then imagine the type of creative firepower we can put into biomedical problems.