« Back

Open Science and Industry Collaboration

Open Science and Industry Collaboration—Addressing New Problems While Improving the Bottom Line

By Scott Gibson

Consumers are happy when products flow nicely, whether the items are tubes of toothpaste or bottles of shampoo, while people in open-science research and private industry, respectively, like workflows that result in better problem solving and higher profits.  

A new program from the Extreme Science and Engineering Discovery Environment (XSEDE) called the Industry Challenge brings the scientific and industrial communities together in multidisciplinary collaborative teams and connects them with XSEDE's world-class advanced digital services.

"The Industry Challenge solicits open science proposals that are fundamental to modeling and simulation problems in industry," said David Hudak, director of the program. "By solving these basic problems, we hope industry will advance."

The idea behind the challenge apparently resonated with the intended communities. The Procter & Gamble Company, the world's largest maker of consumer goods, and Rensselaer Polytechnic Institute (RPI), America's oldest technological university, submitted the winning proposals in this year's competition and have assembled teams composed of both industrial and academic researchers.

The brands of Procter & Gamble (P&G) are household names—Braun, Crest, Oral-B, Head & Shoulders, Bounce, Charmin, and Tide, for example. The P&G research team is exploring how to predict the flow properties (rheology) of surfactants, compounds that lower surface tension. The rheology of surfactant formulations defines the dynamics of such things as how shampoo dispenses from the bottle, mounds on the user's hand, and then spreads between the fingers, said Peter Koenig, principal investigator for the P&G-led project. XSEDE resources will enable rheology predictions via molecular simulations and mechanical models.

"The adjustment of flow properties by altering the composition of ingredients is an important part of the design of new consumer products," Koenig said. "Being able to predict the rheology using computer simulations will focus and accelerate the development process."

Koenig explained that the simulations the P&G project participants need to run are too large for the computers at their respective sites, and so they are using Stampede at the Texas Advanced Computing Center and Keeneland at the National Institute for Computational Sciences. Those supercomputers provided by XSEDE enable the project team to perform the requisite demanding novel molecular simulations of surfactant properties that relate to rheology. "Project plans include some large-scale, high-fidelity simulations to establish the approaches in a suitable way for publication in the scientific literature," he said. "Staff from XSEDE will contribute expertise to the research to optimize the speed and robustness of protocols for routine application."

Meanwhile, XSEDE is assisting the RPI team with the use of Stampede as it develops and demonstrates computationally parallel simulation workflows for companies such as Corning, Inc., ITT Gould Pumps, Pliant Energy Systems, and Sikorsky Aircraft.

"Industry is always focused on the bottom line," Hudak said. "I believe that engaging the academic community in general, and XSEDE in particular, must have a demonstrable return on investment [ROI] or else industrial partners will lose interest. I want to find ways to demonstrate ROI for our Industry Challenge projects."  

Reflecting the goal of enhancing profitability, the RPI team is delving into projects that will enhance the materials processing, flow control, fluid structure interaction, and design methods of the companies involved in the team collaborations.

"Although the reasons companies are interested in having these simulations developed vary, from improving manufacturing process quality, to designing a more effective product, to reducing the time required to produce time-critical designs, the bottom line for each of the companies is improving their bottom line," said Mark Shephard, principal investigator for the RPI-led project.

Shephard added that the RPI team's study is directed not only at creating improved complex simulation workflows but also at increasing the levels of automation and reliability of the simulations. "As part of this research, specific attention is given to the interoperability of the tools produced so they can be used in the fast and cost-effective construction of simulation workflows that address new industrial simulation needs," he explained.

Hudak believes the various research victories made possible by the collaborations along the way will serve to underscore the overarching virtue of the Industry Challenge program. "Addressing the individual challenges represented by these problems will be significant for their respective domains; however, the larger win will be the demonstration that industry and academic teams can work together to achieve results they could not reach alone," he said.


« Back

Seeing is Believing

Extreme Digital visualization and data analysis resources help researchers derive insights from massive data sets

By Aaron Dubrow, Texas Advanced Computing Center

Isaac Asimov, the American science fiction and popular science writer, famously said, "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny. '"

In a world swimming in information, how does a scientist have such a revelation? How do they find a needle of insight in a growing digital haystack?

Scientific visualization is one important tool scientists use to make discoveries. The process of visualization converts data—from sensors, DNA sequencers, social networks, and massive high-performance computing simulations and models—into images that can be perceived by the eye and explored and interpreted by the human mind.

This aspect of discovery has always been valuable, but as our ability to simulate subatomic particles, perform high-resolution 3D scans of the body, or map the universe improves, turning that data into useful information is increasingly critical.

In November 2008, the National Science Foundation (NSF) requested proposals for "TeraGrid Phase III: eXtreme Digital Resources for Science and Engineering (XD)." The grants funded the first of a new class of computing systems: two state-of-the-art resources at the Texas Advanced Computing Center (TACC) and the National Institute for Computational Sciences (NICS) that together increased the visualization and data analysis capabilities of the open science community significantly.

The NSF solicitation was motivated by an awareness that simulations on high-performance computing systems and data from new scientific instruments were producing copious amounts of information that could not be analyzed or visualized by any previous system. 

"We were seeing science at a completely different scale," said Kelly Gaither, principal investigator for the XD Vis award and director of visualization at TACC. "These systems address the data deluge that we saw coming down the pipe as a result of the bigger HPC systems."

TACC's Longhorn was deployed in January 2010 and has been supporting visualization, data analysis, and general computing for a year and a half. A Dell cluster with both NVIDIA GPUs and Intel quad-core CPUs on each node, Longhorn provides unprecedented capabilities, foremost among them, the ability to remotely visualize massive data sets in real-time.

This means a research group in Topeka, Kansas, can compute and visualize their dataset on the Longhorn system in Austin, Texas, from the quiet of their offices. The researchers can move, spin, zoom, and, in some cases, animate the subject with the touch of a button.

Gaither thinks this new capability—a hands-on approach to virtual experiments—improves scientists' relationship to their data and has the potential to transform research.

"Oftentimes, researchers don't know what they're looking for. They use visualization to do debugging or to do exploratory analysis of their simulation data. In those cases, visualization is really the only way to see," Gaither said.  "It's generally recognized in the vis community that interactivity is a crucial component of being able to do that analysis."

Longhorn is the largest hardware-accelerated interactive visualization cluster in the world and has supported these real-time interactions for users as remote as Saudi Arabia. Longhorn is also able to manage incredibly huge data sets, including highly detailed visualizations created to study the instabilities in a burning Helium flame.

Nautilus, an SGI Altix UV1000 system, is likewise a large computing system designed for remote visualization and analysis. It, too, has significant amounts of computational and GPU capacity, but it has a significantly different architecture than Longhorn. Nautilus is a symmetric multiprocessor (SMP) machine, one where the system shares all of the available memory with all of the processors. The scientists see 1,024 CPU processors and 4 TB of memory as one single system. The system also contains eight GPUs for general-purpose processing and hardware-accelerated graphics.

"Graph and societal network analysis. Correlation and document clustering. There are all sorts of analyses that are not amenable to a cluster type of architecture," explained Sean Ahern, director of the Center for Remote Data Analysis and Visualization (RDAV) at NICS (the center that operates Nautilus), and visualization task leader at Oak Ridge National Laboratory. "We've been able to accelerate the science that researchers are already doing, taking it from weeks to hours, and we have other projects where the size of the memory means researchers can pull in entire datasets where they were never able to do so before."

Rather than proposing pure visualization systems, as have dominated in the past, these machines were built to be multipurpose, allowing interactive and batch visualization, GPGPU (general-purpose GPU-based) computing, traditional HPC computing, and new kinds of data analysis.

This composite nature allows the systems to provide improved visualization resources to the academic community, while remaining fully used to maximize the public investment.

Like all resources in the XSEDE infrastructure, Longhorn and Nautilus run 24 hours a day, 7 days a week, 365 days a year, and are supported by expert staff at the host centers. The resources are available to U.S. researchers through an XSEDE allocation from the National Science Foundation.

Over the course of the past year and a half, 1,560   scientists have used Longhorn and Nautilus, applying their unique speed and capabilities to wide-ranging science problems, while also exploring what role GPU-processing can play in science generally.

The results emerging from the systems are encouraging.

Some of the notable successes on Longhorn are a collaboration with the National Archives and Record Administration to develop a new visualization framework for digital archivists; visualizations of the Gulf oil spill that helped the National Oceanic and Atmospheric Administration and the Coast Guard locate and contain oil slicks; record-setting molecular dynamics simulations of surfactants, which are used in detergents, manufacturing, and nanotechnology; and visualizations of the earthquake in Japan.

"With our analysis code, I get as much as 16,000 times speedup on Longhorn, which has given much insight into the physics of the protein-water interface, and allows us to understand at a more fundamental level how nature designs proteins to catalyze reactions under non-extreme conditions," said David LeBard, a postdoctoral fellow in the Institute for Computational Molecular Science at Temple University.

Simulations by LeBard and his collaborator Dmitry V. Matyushov appeared in the Journal of Physical Chemistry B and were featured on the cover of Physical Chemistry Chemical Physics in December 2010.

Nautilus has seen similar successes. Researchers on the system have performed unprecedented species modeling in the Great Smoky Mountains National Park, a biodiversity hot spot; gained new insights in the role turbulence plays in fusion; and explored how human society has evolved over the last half-century using historical sources.

"Nautilus has been a critical enabling resource for the GlobalNet project in several ways," said Kalev Leetaru, senior research scientist for content analysis at the Illinois Institute for Computing in Humanities, Art and Social Science (I-CHASS). "Most visibly, the ability to instantly leverage terabytes of memory in a single system image has allowed the project for the first time to move beyond small 1 to 5 percent samples to explore the dataset as a whole, leading to numerous fundamental new discoveries simply not possible without the ability to analyze the entire dataset at once."

Together, the two systems have supported 759 projects, totaling 11.4 million computing hours (the equivalent of 1,250 years on a single desktop system) in the last year and a half.

Visualization and data analysis are clearly moving into the mainstream, and with the Extreme Digital visualization grants, the NSF has given a big boost to the national science community. Gaither and Ahern believe this could be the beginning of a new paradigm.

"Seeing the visualization and interacting with the data is probably one of the great enablers that will propel science for the next generation and beyond," Gaither said. "I think in some respects, you won't even see this intermediate thing called a ‘dataset'. You will interact with the simulations itself, or, if you'd prefer, with the science."

Ahern went further.

"Data without analysis is nothing," Ahern said. "If you've run a giant simulation, you've only done half the work. The real science comes from processing that data into something that people can understand. The job of science is done in the phase of analysis, and that's purely where we live."