Preaching the HPC Gospel
XSEDE's Campus Champions Provide Vital Link between Researchers, Supercomputing Resources
To get the help you need, sometimes you have to break something first.
Dirk Colbry admits, a bit sheepishly, that he made his debut in Michigan State University's high performance computing (HPC) center the hard way. As a graduate student investigating 3D face recognition for possible security applications, he'd been stringing together every computer he could get his hands on to get results. But the computational burden was overwhelming his Rube Goldberg cluster.
"The preliminary algorithm I developed took about ten seconds" to analyze a single face, he says. "We thought that would be acceptable, say, in an airport security screening. But, I needed a test that could compare 5,000 scans with each other."
Not being willing to wait years for about 25 million comparisons (5,000 times 5,000)—let alone for his PhD—he took a suggestion from a colleague that he try the university's cluster.
"I promptly submitted the biggest job they had ever seen, and shut the thing down," Colbry says. "Andy Keen, the guy who developed the cluster slapped my hand and said, ‘Don't do that again.' But he also said, ‘I want you to keep testing the system.'"
When he got his degree, Colbry didn't immediately go into HPC as a profession. But the seeds had been planted.
Colbry's experience isn't that unusual among HPC researchers and engineers, according to Kay Hunt of Purdue University, project coordinator for XSEDE's Campus Champions Program. A tip from a colleague, an experienced guiding hand when a project hits a wall, even simple encouragement are often the key ingredients in turning a struggling researcher into an enthusiastic consumer of HPC resources. XSEDE initiated the Campus Champions Program in 2008, she explains, to institutionalize some of the informal networking, consulting and mentoring that individual faculty had been contributing to the HPC brew.
"We identify a local person at a campus or institution who's going to do three things as a campus champion," Hunt says. "First, letting their colleagues on campus know that XSEDE exists, that it offers resources for them at no cost and how to get access to them; second, serving as the point person to help them get started; and third, giving feedback to the XSEDE staff on what could be improved, what documentation needs to be created or updated and what processes can be streamlined or made more efficient and make the whole XSEDE experience better."
Since the initiation of the program, it has expanded to include student champions, who help campus champions reach students on campus; regional champions, who target a multi-campus audience; and domain champions, who concentrate on spreading the XSEDE gospel to members of a specific scientific discipline. But the campus champions remain the anchor of the program, preaching the Word, making connections, referring to training courses and materials and sometimes just holding hands to help newcomers master the incredibly productive technologies represented by HPC.
The use of religious metaphor isn't a conceit; you run into it frequently when you talk with XSEDE's campus champions.
Dana Brunson, a campus champion at Oklahoma State University, in fact calls herself "the cyberevangelist for the campus." Her job at OK State focuses on user support at the university's HPC center. Hers was a quick conversion, she explains.
"I started my position in the Fall of 2007 and, very quickly, Jeff Pummill at the University of Arkansas introduced me to XSEDE," she says. At her first HPC conference—SC08—Pummill, himself a campus champion, introduced her to Kay Hunt, who immediately recruited her to the program.
"The program helps me offer local people any kind of resource they need," she says. XSEDE gives her the ability to connect people with a nationwide array of supercomputers, networking resources, archival systems and other specialized machines that no single university could ever match. "I need never say, ‘Oh, I'm sorry; you can't do that research because we don't have those resources.'"
Pummill's foray into large-scale scientific computing began in 2005, when the University of Arkansas acquired its first Top500 cluster.
"They quickly realized that a part-time graduate student was not sufficient to maintain and implement a system of that size," he jokes. "As a result, I spent the first few years as both systems administrator and user support."
Pummill was an early entry to the campus champions program, and was both elated and sometimes astounded at the program's growth.
"In the early days, you could fit everybody in a small room," he says. As the program has grown to more than 200 campus champions, he says, the informal nature of those original meetings had to evolve—though the passion in the group remains.
"You find that people are siloed," Pummill says, "developing moderately good solutions on their campus. It's good enough to get them by, but highly inefficient as everyone is duplicating the effort."
A key advantage to the campus champions' extolling XSEDE's virtues is that it breaks down those silos by encouraging networking and communication between members.
"This creates more elegant solutions via the champions' network of expertise," he says. "In turn, online media maintain this information for future reference and dissemination by others. It forestalls a lot of wheel-reinvention."
Dirk Colbry couldn't agree more.
He'd kept a hand in HPC through his research, but only considered making it his career in 2009, when his wife got a position back at Michigan State. The very HPC center he'd crashed as a graduate student was looking for a consultant to help researchers access the center; it proved a perfect fit. When Michigan State campus champion Brock Palen interviewed Kay Hunt for his podcast, Colbry's ears perked up: He saw another way he could help his researchers, and so he joined up.
"One of the best ways to enumerate the value of the program is to watch our email list," Colbry says. "If you put in a ticket for a complex HPC issue to any good help desk you'll get a response that says, ‘We're working on it,' and after a couple of days of back and forth you'll get the answer you're looking for. I'll put the same question on the campus champions list and I'll get seven or eight answers within five minutes. There's no place else to find that type of information so easily."
Seeing is Believing
Extreme Digital visualization and data analysis resources help researchers derive insights from massive data sets
By Aaron Dubrow, Texas Advanced Computing Center
Isaac Asimov, the American science fiction and popular science writer, famously said, "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny. '"
In a world swimming in information, how does a scientist have such a revelation? How do they find a needle of insight in a growing digital haystack?
Scientific visualization is one important tool scientists use to make discoveries. The process of visualization converts data—from sensors, DNA sequencers, social networks, and massive high-performance computing simulations and models—into images that can be perceived by the eye and explored and interpreted by the human mind.
This aspect of discovery has always been valuable, but as our ability to simulate subatomic particles, perform high-resolution 3D scans of the body, or map the universe improves, turning that data into useful information is increasingly critical.
In November 2008, the National Science Foundation (NSF) requested proposals for "TeraGrid Phase III: eXtreme Digital Resources for Science and Engineering (XD)." The grants funded the first of a new class of computing systems: two state-of-the-art resources at the Texas Advanced Computing Center (TACC) and the National Institute for Computational Sciences (NICS) that together increased the visualization and data analysis capabilities of the open science community significantly.
The NSF solicitation was motivated by an awareness that simulations on high-performance computing systems and data from new scientific instruments were producing copious amounts of information that could not be analyzed or visualized by any previous system.
"We were seeing science at a completely different scale," said Kelly Gaither, principal investigator for the XD Vis award and director of visualization at TACC. "These systems address the data deluge that we saw coming down the pipe as a result of the bigger HPC systems."
TACC's Longhorn was deployed in January 2010 and has been supporting visualization, data analysis, and general computing for a year and a half. A Dell cluster with both NVIDIA GPUs and Intel quad-core CPUs on each node, Longhorn provides unprecedented capabilities, foremost among them, the ability to remotely visualize massive data sets in real-time.
This means a research group in Topeka, Kansas, can compute and visualize their dataset on the Longhorn system in Austin, Texas, from the quiet of their offices. The researchers can move, spin, zoom, and, in some cases, animate the subject with the touch of a button.
Gaither thinks this new capability—a hands-on approach to virtual experiments—improves scientists' relationship to their data and has the potential to transform research.
"Oftentimes, researchers don't know what they're looking for. They use visualization to do debugging or to do exploratory analysis of their simulation data. In those cases, visualization is really the only way to see," Gaither said. "It's generally recognized in the vis community that interactivity is a crucial component of being able to do that analysis."
Longhorn is the largest hardware-accelerated interactive visualization cluster in the world and has supported these real-time interactions for users as remote as Saudi Arabia. Longhorn is also able to manage incredibly huge data sets, including highly detailed visualizations created to study the instabilities in a burning Helium flame.
Nautilus, an SGI Altix UV1000 system, is likewise a large computing system designed for remote visualization and analysis. It, too, has significant amounts of computational and GPU capacity, but it has a significantly different architecture than Longhorn. Nautilus is a symmetric multiprocessor (SMP) machine, one where the system shares all of the available memory with all of the processors. The scientists see 1,024 CPU processors and 4 TB of memory as one single system. The system also contains eight GPUs for general-purpose processing and hardware-accelerated graphics.
"Graph and societal network analysis. Correlation and document clustering. There are all sorts of analyses that are not amenable to a cluster type of architecture," explained Sean Ahern, director of the Center for Remote Data Analysis and Visualization (RDAV) at NICS (the center that operates Nautilus), and visualization task leader at Oak Ridge National Laboratory. "We've been able to accelerate the science that researchers are already doing, taking it from weeks to hours, and we have other projects where the size of the memory means researchers can pull in entire datasets where they were never able to do so before."
Rather than proposing pure visualization systems, as have dominated in the past, these machines were built to be multipurpose, allowing interactive and batch visualization, GPGPU (general-purpose GPU-based) computing, traditional HPC computing, and new kinds of data analysis.
This composite nature allows the systems to provide improved visualization resources to the academic community, while remaining fully used to maximize the public investment.
Like all resources in the XSEDE infrastructure, Longhorn and Nautilus run 24 hours a day, 7 days a week, 365 days a year, and are supported by expert staff at the host centers. The resources are available to U.S. researchers through an XSEDE allocation from the National Science Foundation.
Over the course of the past year and a half, 1,560 scientists have used Longhorn and Nautilus, applying their unique speed and capabilities to wide-ranging science problems, while also exploring what role GPU-processing can play in science generally.
The results emerging from the systems are encouraging.
Some of the notable successes on Longhorn are a collaboration with the National Archives and Record Administration to develop a new visualization framework for digital archivists; visualizations of the Gulf oil spill that helped the National Oceanic and Atmospheric Administration and the Coast Guard locate and contain oil slicks; record-setting molecular dynamics simulations of surfactants, which are used in detergents, manufacturing, and nanotechnology; and visualizations of the earthquake in Japan.
"With our analysis code, I get as much as 16,000 times speedup on Longhorn, which has given much insight into the physics of the protein-water interface, and allows us to understand at a more fundamental level how nature designs proteins to catalyze reactions under non-extreme conditions," said David LeBard, a postdoctoral fellow in the Institute for Computational Molecular Science at Temple University.
Simulations by LeBard and his collaborator Dmitry V. Matyushov appeared in the Journal of Physical Chemistry B and were featured on the cover of Physical Chemistry Chemical Physics in December 2010.
Nautilus has seen similar successes. Researchers on the system have performed unprecedented species modeling in the Great Smoky Mountains National Park, a biodiversity hot spot; gained new insights in the role turbulence plays in fusion; and explored how human society has evolved over the last half-century using historical sources.
"Nautilus has been a critical enabling resource for the GlobalNet project in several ways," said Kalev Leetaru, senior research scientist for content analysis at the Illinois Institute for Computing in Humanities, Art and Social Science (I-CHASS). "Most visibly, the ability to instantly leverage terabytes of memory in a single system image has allowed the project for the first time to move beyond small 1 to 5 percent samples to explore the dataset as a whole, leading to numerous fundamental new discoveries simply not possible without the ability to analyze the entire dataset at once."
Together, the two systems have supported 759 projects, totaling 11.4 million computing hours (the equivalent of 1,250 years on a single desktop system) in the last year and a half.
Visualization and data analysis are clearly moving into the mainstream, and with the Extreme Digital visualization grants, the NSF has given a big boost to the national science community. Gaither and Ahern believe this could be the beginning of a new paradigm.
"Seeing the visualization and interacting with the data is probably one of the great enablers that will propel science for the next generation and beyond," Gaither said. "I think in some respects, you won't even see this intermediate thing called a ‘dataset'. You will interact with the simulations itself, or, if you'd prefer, with the science."
Ahern went further.
"Data without analysis is nothing," Ahern said. "If you've run a giant simulation, you've only done half the work. The real science comes from processing that data into something that people can understand. The job of science is done in the phase of analysis, and that's purely where we live."