XSEDE Science Successes
Explore Large Data for Scientific Discovery
Exploring Large Data for Scientific Discovery
Scott Gibson, Communications Specialist, University of Tennessee
More elegant techniques combined with highly interdisciplinary, multi-scale collaboration are essential for dealing with massive amounts of information, plenary speaker says at the XSEDE15 conference.
A curse of dealing with mounds of data so massive that they require special tools, said computer scientist Valerio Pascucci, is if you look for something, you will probably find it, thus injecting bias into the analysis.
In his plenary talk titled "Extreme Data Management Analysis and Visualization: Exploring Large Data for Science Discovery" on July 28 during the XSEDE15 conference in St. Louis, Dr. Pascucci said that getting clean, guaranteed, unbiased results in data analyses requires highly interdisciplinary, multi-scale collaboration and techniques that unify the math and computer science behind the applications used in physics, biology, and medicine.
The techniques and use cases he shared during his talk reflected about a decade and a half of getting down in the trenches to understand research efforts in disparate scientific domains, cutting through semantics, and capturing extensible mathematical foundations that could be applied in developing robust, efficient algorithms and applications.
Fewer Tools but Greater Utility
"You can build an economy of tools by deconstructing the math, looking for commonalities, and developing fewer tools that can be of use to more people," Pascucci said in a post-talk interview. And to avoid developing biased algorithms, "you try to delay as long as possible application development." The goal, he noted, is to create techniques that leave little room for mental shortcuts, or heuristics, and emphasize a formalized mathematical approach.
Creating those techniques, however, requires cross-pollination between, and integration of, data management and data analysis, tasks that have traditionally been performed by different communities, Pascucci pointed out. Collaboration that combines those efforts, he added, is a necessary ingredient for a successful supercomputing center or cyberinfrastructure—a dynamic ecosystem of people, advanced computing systems, data, and software.
Processing on the Fly
In managing large datasets, a platform for processing on the fly is important, said Pascucci, because researchers need to be able to make decisions under incomplete information. "This is something that people often underestimate," he added.
One innovation that Pascucci and his colleagues at the Center for Extreme Data Management, Analysis, and Visualization (CEDMAV) at the University of Utah have developed is a framework, called ViSUS, for processing large-scale scientific data with high-performance selective queries on multiple terabytes of raw data. This data model is combined with progressive streaming techniques that allow processing on a variety of computing devices, from iPhone to simple workstations, to the input/output of parallel supercomputers. The framework has, for example, enabled the real-time streaming of massive combustion simulations from U.S. Department of Energy platforms to national laboratories.
Key Infrastructure Elements and the Big Picture
Pascucci's talk described performance, user access, analytics, and applications as being the key elements of a computational infrastructure for scientific discovery, and it delved into each area with demonstrations and use cases—from interactive remote analysis and visualization of 6 terabytes of imaging data, to scaling in-situanalytics, and discussing data abstractions and visual metaphors, as well as examples of simulations, experiments, and data collection associated with climate, combustion, astrophysics, microscopy, Twitter, and more.
The analytics technique of topology, the area of mathematics concerned with the properties of spaces, was a subject of emphasis as a very good complement to statistics. Topology, Pascucci explained, is particularly adept at showing local and global trends and describing shapes with great precision, but the numerical principles are also extensible to the analyses of other types of data. He presented the application of a discrete topological framework for the representation and analysis of large-scale scientific data.
In the big picture of dealing with massive amounts of information, integrated management, analysis, and visualization of large data can be a catalyst for a virtuous cycle of collaborative activities involving basic research, software deployment, user support, and commercialization; and a wide spectrum of interdisciplinary collaborations can have a positive effective relative to motivating the work, providing formal theoretical approaches, and providing feedback to specific disciplines, Pascucci said.
Pascucci is the founding director of CEDMAV; a professor at the Scientific Computing and Imaging Institute, and the School of Computing, the University of Utah; a laboratory fellow at Pacific Northwest National Laboratory; and chief technology officer at Visus LLC (visus.net), a spin-off of the University of Utah.
- XSEDE Resources, Trinity Enable Non-Human Primate Reference Transcriptome Resource to Support Study of Genes in Our Closest Relatives
- Turtle Tree of Life
- Region 1 Champions meet at Idaho National Laboratory
- Crash test simulations expose real risks
- NSF supports development of new arctic maps
- How was the planet Earth formed?
- Exploring Large Data for Scientific Discovery
- XSEDE Value Added
- Scholars program helps realize dream
- Making sense of cyberinfrastructure
- XSEDE15 Wrap Up
- Bioinformatics Scripts Solutions
- XSEDE15 Plenary Panel
- Polymer Potential
- The Future of NSF Advanced Computing Infrastructure
- 2015 International Summer School on HPC Challenges
- A Catalyst for Complexity
- As Austin Grows So Does Its Traffic Woes
- The University of Tennessee, Knoxville, Wins Second Place in an International Student Supercomputing Competition
- PSC Receives NSF Award for Bridges Supercomputer
- Innovative New Supercomputers Increase Nation's Computational Capacity and Capability
- Exploring Competitive Balance
- A Direct Bridge
- The Dopamine Transporter
- XSEDE Supercomputers Laid the Foundation for an Unprecedented Simulation of Cosmological Evolution
- Big Data Needs Big Funding
- XSEDE helps create a more effective way to assemble genomic information
- Of Micelles and Machines
- XSEDE Allocation System to Receive Makeover
- Internet2: Advancing Science in the Age of Big Data
- XSEDE User Portal At Your Fingertips: Mobile App
- Researchers Study Air Pollution
- Dan Stanzione: New Executive Director at TACC
- People of XSEDE: Campus Champions - Preaching the HPC Gospel
- XSEDE and Blue Waters Go Supernova
- Two at a Time
- Show Him the Money
- Cosmic Slurp
- Turning Salt into the Unknown
- Looking Inside Images
- Farming the Wind
- Breaking out of the Digital Graveyard
- The Mechanism of Short-term Memory
- Open Science and Industry Collaboration
- XSEDE, Prace Call for Requests of Joint Support
- XSEDE Wins HPCWire Award
- Shields to Maximum, Mr. Scott
- The Ultimate Timekeeper
- Blue Waters, XSEDE sign collaborative agreement
- People of XSEDE - Outreach programs set XSEDE apart
- Wrangler Reels in Award
- The Great Comet: NSF awards $12 Million Grant to SDSC to deploy Comet
- Meet the Gribbles
- 2013 Nobel Prize in Chemistry winners bring HPC to the lab
- XSEDE helps create a more effective way to assemble genomic information
- XSEDE facilitates large-scale image analysis to understand diseases
- XSEDE announces new campus briding services and tools
- XSEDE, NSF Release Cloud Survey Report
- XSEDE13: Programming Competition Allows Students to "Geek Out" and Gain Crucial Skillsets
- Katlin Thaney gave XSEDE13 Keynote: Gateways for Open Science
- XSEDE13 conference selects best papers, posters visualizations and more
- XSEDE13 speaker tells how turbulence simulations help make movie magic
- XSEDE13 Plenary Talk: Accelerating Brain Research with Supercomputers
- Invited speakers announced for Extreme Scaling Workshop - Heterogenous Computing
- XSEDE13 speaker LeManuel "Lee" Bitsóí: Democratizing Scientific Research
Read more about Bitsóí's talk at this year's conference - More than 70 students from 4 continents gain HPC skills at fourth annual Summer School
- Registration opens for Extreme Scaling Workshop 2013
- Campus Champions Fellows Named
- Campus Champions program reaches 200 members
- Rock Snot Genomics: University of Texas researchers use advanced sequencing and TACC's Ranger supercomputer to uncover origin of common algae
- Experiencing some turbulence: Researchers Take on One of Physics' Most Important and Enduring Problems
- Register now for Virtual School summer courses on data-intensive and many-core computing
- XSEDE seeks a Scientific Workflow Specialist for Extended Collaborative Support Service
Applications are due May 31, 2013 - XSEDE13 schedule now available online
- Students from high school to grad school levels invited to participate in programming contest at XSEDE13 high performance computing conference
- SDSC's Gordon enables discoveries in the study of genetics Read about Gordon's role in pinpointing the genetic patterns underlying autism-spectrum disorders, schizophrenia and similar brain conditions.
- XSEDE, National Computational Science Institute offer summer workshops for educators
- XSEDE13 Student Day applications due May 15 High school and undergraduate students get hands-on experience in computational science and interact with expert researchers
- XSEDE upgrades to Internet2's 100G Network
- XSEDE13 Registration now open!
- Get to know XSEDE Staff XSEDE Allocations Manager Ken Hackworth: The Man, The Myth, The Legend
- Two sponsors commit to XSEDE13 conference: Cray and Intel .
- Texas Unleashes Stampede
- Swirling Secrets-Understanding the turbulence of gases
- Blacklight helps researchers develop better materials for carbon capture
- Journey to the limits of spacetime
- Students invited to participate in XSEDE13 Multiple ways for high school, undergraduate, and graduate students to get involved; funding support available.
- XSEDE Call for Humanities, Arts and Social Science ProjectsIf you and your collaborators need to access to large collections of digital data, need more computer power, or require substantial storage capacity and computing power – please share it with XSEDE.
- XSEDE needs your feedback! If you received an invitation to complete the 2013 User Satisfaction Survey, please take 10 minutes today to share your comments about the XSEDE user experience.
- XSEDE deploys Globus Online for data transfer The first official software service on XSEDE has been accepted for production deployment
-
The Stampede Era Begins XSEDE supercomputer now operational and available to the national open science community
- Call for ParticipationInternational Summer School on HPC Challenges in Computational Sciences
- XSEDE, European Grid Infrastructure seek collaborative use cases
Deadline extended to March 8! - XSEDE offers free online parallel computing course Learn to use parallel computers more efficiently and productively
- NICS makes the top of Green500 list XSEDE partner recognized for energy-conscious high-performance computer, Beacon
- XSEDE's John Towns appointed to Compute Canada board of directors Board includes leaders in industry, academia, and computational research
- STILL ACCEPTING RESPONSES to Cloud Use Survey from XSEDE, NSF All researchers encouraged to respond and help shape future of cloud computing in XSEDE
- Make room for Stampede: TACC expands data center for new supercomputer
Read more about the new data center at TACC
See TACC Deputy Director, Dan Stanzione describe the new center - SDSC welcomes Gordon supercomputer as a research powerhouse
Read more about SDSC's Gordon - Campus Bridging Early Adopter Program issues Call For Proposals to be submitted Dec. 1-9
Read more about the program - XSEDE12 announced -- first conference of Extreme Science and Engineering Discovery Environment
Read more about XSEDE12 - PSC, SGI Team Up on Shared-Memory Supercomputer
Read more about PSC's shared-memory supercomputer - Pittsburgh Supercomputing Center Wins High-Performance Computing Award
Read more about PSC - Blacklight Goes to Work at the Pittsburgh Supercomputing Center
Read more about Blacklight - Ranger supercomputer's lifespan extended one year as part of NSF XD initiative.
Read more about Ranger - Kraken set to deliver 2 billionth CPU hour, sustains 96 percent utilization
Read more about Kraken - TACC Offers New, Broader Computational Biology Software Stack to Open Science Community.
Read more about biology software stack - ACM launches new Special Interest Group on High Performance Computing. Join by Nov. 18 for special rate.
Read more about the new SIGHPC - 'What Are You Working on Today,' Ranger, Jaguar and iForge?
Read more about TACC's Ranger supercomputer
Read more about ORNL's Jaguar supercomputer
Read more about NCSA's iForge supercomputer - Adventures with HPC Accelerators, GPUs and Intel MIC Coprocessors
Read more about experiences with new hardware - Developing Scientific Computing Communities
Read more about development efforts - Indiana University to create the National Center for Genome Analysis Support, which will be integrated with XSEDE resources
Read more about the NCGAS at IU - Scientists use XSEDE/TeraGrid resources to determine how shock waves move through solids
Read more about 'super-elastic shock waves' - XSEDE upgrades network
Read more about the XSEDE upgrade - Richard Tapia, Rice University mathematician and professor and member of XSEDE outreach team, receives National Medal of Science
Watch the Oct. 21 webcast
Read more about Tapia's award
Learn more about Richard Tapia - Stampede's comprehensive capabilities to bolster U.S. open science computational resources
Read more about Stampede
Watch a video of Jay Boisseau, director of TACC, discussing Stampede - SDSC announces scalable, high-performance data storage cloud
Read more about SDSC cloud - Appro and SDSC Gordon supercomputer to provide up to 35M IOPS
Read more about SDSC's Gordon - Dr. Barry Schneider from the National Science Foundation to describe XSEDE in the Oklahoma Supercomputing Symposium keynote, Oct. 11-12
Read more about Dr. Schneider's keynote
Go to symposium site - Students research solar cells with HPC
Read more about HPC and solar research - Seeing Is Believing: Extreme Digital visualization and data analysis resources help researchers derive insights from massive data sets
Read more about Extreme Digital - New "Memory Advantage Program" on Blacklight at the Pittsburgh Supercomputing Center
Read more about PSC's MAP - XSEDE project brings advanced cyberinfrastructure, digital services, and expertise to nation's scientists and engineers
Read more about XSEDE - Watch the John Towns video
- How XSEDE will facilitate collaborative science
Read more about XSEDE and collaboration
