XSEDE Science Successes
Opening the spigot at XSEDE
Opening the spigot at XSEDE
Posted on 09 AUG, 2016
A boost from sequencing technologies and computational tools is in store for scientists studying how cells change which of their genes are active.
Researchers using the Extreme Science and Engineering Discovery Environment (XSEDE) collaboration of supercomputing centers have reported advances in reconstructing cells' transcriptomes — the genes activated by 'transcribing' them from DNA into RNA.
The work aims to clarify the best practices in assembling transcriptomes, which ultimately can aid researchers throughout the biomedical sciences.
"It's crucial to determine the important factors that affect transcriptome reconstruction," says Noushin Ghaffari of AgriLife Genomics and Bioinformatics, at Texas A&M University. "This work will particularly help generate more reliable resources for scientists studying non-model species" — species not previously well studied.
Ghaffari is principal investigator in an ongoing project whose preliminary findings and computational aspects were presented at the XSEDE16 conference in Miami in July. She is leading a team of students and supercomputing experts from Texas A&M, Indiana University, and the Pittsburgh Supercomputing Center (PSC).
The scientists sought to improve the quality and efficiency of assembling transcriptomes, and they tested their work on two real data sets from the Sequencing Quality Control Consortium (SEQC) RNA-Seq data: One of cancer cell lines and one of brain tissues from 23 human donors.
What's in a transcriptome?
The transcriptome of a cell at a given moment changes as it reacts to its environment. Transcriptomes offer critical clues of how cells change their behavior in response to disease processes like cancer, or normal bodily signals like hormones.
Assembling a transcriptome is a big undertaking with current technology, though. Scientists must start with samples containing tens or hundreds of thousands of RNA molecules that are each thousands of RNA 'base units' long. Trouble is, most of the current high-speed sequencing technologies can only read a couple hundred bases at one time.
So researchers must first chemically cut the RNA into small pieces, sequence it, remove RNA not directing cell activity, and then match the overlapping fragments to reassemble the original RNA molecules.
Harder still, they must identify and correct sequencing mistakes, and deal with repetitive sequences that make the origin and number of repetitions of a given RNA sequence unclear.
While software tools exist to undertake all of these tasks, Ghaffari's report was the most comprehensive yet to examine a variety of factors that affect assembly speed and accuracy when these tools are combined in a start-to-finish workflow.
Heavy lifting
The most comprehensive study of its kind, the report used data from SEQC to assemble a transcriptome, incorporating many quality control steps to ensure results were accurate. The process required vast amounts of computer memory, made possible by PSC's high-memory supercomputers Blacklight, Greenfield, and now the new Bridges system's 3-terabyte 'large memory nodes.'
"As part of this work, we are running some of the largest transcriptome assemblies ever done," says coauthor Philip Blood of PSC, an expert in XSEDE's Extended Collaborative Support Service. "Our effort focused on running all these big data sets many different ways to see what factors are important in getting the best quality. Doing this required the large memory nodes on Bridges, and a lot of technical expertise to manage the complexities of the workflow."
During the study, the team concentrated on optimizing the speed of data movement from storage to memory to the processors and back.
They also incorporated new verification steps to avoid perplexing errors that arise when wrangling big data through complex pipelines. Future work will include the incorporation of 'checkpoints' — storing the computations regularly so that work is not lost if a software error happens.
Ultimately, Blood adds, the scientists would like to put the all the steps of the process into an automated workflow that will make it easy for other biomedical researchers to replicate.
The work will provide tools to help other scientists improve our understanding of how living organisms respond to disease, environment and evolutionary changes, the scientists reported.

Digital detectives. Researchers from Texas A&M are using XSEDE resources to manage the data from transcriptome assembly. Studying transcriptomes will offer critical clues of how cells change their behavior in response to disease processes.

Building bridges. Bridges, a new PSC supercomputer, is designed for unprecedented flexibility and ease of use. It will include database and web servers to support gateways, collaboration, and powerful data management functions. Courtesy Pittsburgh Supercomputing Center.
- XSEDE Resources, Trinity Enable Non-Human Primate Reference Transcriptome Resource to Support Study of Genes in Our Closest Relatives
- Turtle Tree of Life
- Region 1 Champions meet at Idaho National Laboratory
- Crash test simulations expose real risks
- NSF supports development of new arctic maps
- How was the planet Earth formed?
- Exploring Large Data for Scientific Discovery
- XSEDE Value Added
- Scholars program helps realize dream
- Making sense of cyberinfrastructure
- XSEDE15 Wrap Up
- Bioinformatics Scripts Solutions
- XSEDE15 Plenary Panel
- Polymer Potential
- The Future of NSF Advanced Computing Infrastructure
- 2015 International Summer School on HPC Challenges
- A Catalyst for Complexity
- As Austin Grows So Does Its Traffic Woes
- The University of Tennessee, Knoxville, Wins Second Place in an International Student Supercomputing Competition
- PSC Receives NSF Award for Bridges Supercomputer
- Innovative New Supercomputers Increase Nation's Computational Capacity and Capability
- Exploring Competitive Balance
- A Direct Bridge
- The Dopamine Transporter
- XSEDE Supercomputers Laid the Foundation for an Unprecedented Simulation of Cosmological Evolution
- Big Data Needs Big Funding
- XSEDE helps create a more effective way to assemble genomic information
- Of Micelles and Machines
- XSEDE Allocation System to Receive Makeover
- Internet2: Advancing Science in the Age of Big Data
- XSEDE User Portal At Your Fingertips: Mobile App
- Researchers Study Air Pollution
- Dan Stanzione: New Executive Director at TACC
- People of XSEDE: Campus Champions - Preaching the HPC Gospel
- XSEDE and Blue Waters Go Supernova
- Two at a Time
- Show Him the Money
- Cosmic Slurp
- Turning Salt into the Unknown
- Looking Inside Images
- Farming the Wind
- Breaking out of the Digital Graveyard
- The Mechanism of Short-term Memory
- Open Science and Industry Collaboration
- XSEDE, Prace Call for Requests of Joint Support
- XSEDE Wins HPCWire Award
- Shields to Maximum, Mr. Scott
- The Ultimate Timekeeper
- Blue Waters, XSEDE sign collaborative agreement
- People of XSEDE - Outreach programs set XSEDE apart
- Wrangler Reels in Award
- The Great Comet: NSF awards $12 Million Grant to SDSC to deploy Comet
- Meet the Gribbles
- 2013 Nobel Prize in Chemistry winners bring HPC to the lab
- XSEDE helps create a more effective way to assemble genomic information
- XSEDE facilitates large-scale image analysis to understand diseases
- XSEDE announces new campus briding services and tools
- XSEDE, NSF Release Cloud Survey Report
- XSEDE13: Programming Competition Allows Students to "Geek Out" and Gain Crucial Skillsets
- Katlin Thaney gave XSEDE13 Keynote: Gateways for Open Science
- XSEDE13 conference selects best papers, posters visualizations and more
- XSEDE13 speaker tells how turbulence simulations help make movie magic
- XSEDE13 Plenary Talk: Accelerating Brain Research with Supercomputers
- Invited speakers announced for Extreme Scaling Workshop - Heterogenous Computing
- XSEDE13 speaker LeManuel "Lee" Bitsóí: Democratizing Scientific Research
Read more about Bitsóí's talk at this year's conference - More than 70 students from 4 continents gain HPC skills at fourth annual Summer School
- Registration opens for Extreme Scaling Workshop 2013
- Campus Champions Fellows Named
- Campus Champions program reaches 200 members
- Rock Snot Genomics: University of Texas researchers use advanced sequencing and TACC's Ranger supercomputer to uncover origin of common algae
- Experiencing some turbulence: Researchers Take on One of Physics' Most Important and Enduring Problems
- Register now for Virtual School summer courses on data-intensive and many-core computing
- XSEDE seeks a Scientific Workflow Specialist for Extended Collaborative Support Service
Applications are due May 31, 2013 - XSEDE13 schedule now available online
- Students from high school to grad school levels invited to participate in programming contest at XSEDE13 high performance computing conference
- SDSC's Gordon enables discoveries in the study of genetics Read about Gordon's role in pinpointing the genetic patterns underlying autism-spectrum disorders, schizophrenia and similar brain conditions.
- XSEDE, National Computational Science Institute offer summer workshops for educators
- XSEDE13 Student Day applications due May 15 High school and undergraduate students get hands-on experience in computational science and interact with expert researchers
- XSEDE upgrades to Internet2's 100G Network
- XSEDE13 Registration now open!
- Get to know XSEDE Staff XSEDE Allocations Manager Ken Hackworth: The Man, The Myth, The Legend
- Two sponsors commit to XSEDE13 conference: Cray and Intel .
- Texas Unleashes Stampede
- Swirling Secrets-Understanding the turbulence of gases
- Blacklight helps researchers develop better materials for carbon capture
- Journey to the limits of spacetime
- Students invited to participate in XSEDE13 Multiple ways for high school, undergraduate, and graduate students to get involved; funding support available.
- XSEDE Call for Humanities, Arts and Social Science ProjectsIf you and your collaborators need to access to large collections of digital data, need more computer power, or require substantial storage capacity and computing power – please share it with XSEDE.
- XSEDE needs your feedback! If you received an invitation to complete the 2013 User Satisfaction Survey, please take 10 minutes today to share your comments about the XSEDE user experience.
- XSEDE deploys Globus Online for data transfer The first official software service on XSEDE has been accepted for production deployment
-
The Stampede Era Begins XSEDE supercomputer now operational and available to the national open science community
- Call for ParticipationInternational Summer School on HPC Challenges in Computational Sciences
- XSEDE, European Grid Infrastructure seek collaborative use cases
Deadline extended to March 8! - XSEDE offers free online parallel computing course Learn to use parallel computers more efficiently and productively
- NICS makes the top of Green500 list XSEDE partner recognized for energy-conscious high-performance computer, Beacon
- XSEDE's John Towns appointed to Compute Canada board of directors Board includes leaders in industry, academia, and computational research
- STILL ACCEPTING RESPONSES to Cloud Use Survey from XSEDE, NSF All researchers encouraged to respond and help shape future of cloud computing in XSEDE
- Make room for Stampede: TACC expands data center for new supercomputer
Read more about the new data center at TACC
See TACC Deputy Director, Dan Stanzione describe the new center - SDSC welcomes Gordon supercomputer as a research powerhouse
Read more about SDSC's Gordon - Campus Bridging Early Adopter Program issues Call For Proposals to be submitted Dec. 1-9
Read more about the program - XSEDE12 announced -- first conference of Extreme Science and Engineering Discovery Environment
Read more about XSEDE12 - PSC, SGI Team Up on Shared-Memory Supercomputer
Read more about PSC's shared-memory supercomputer - Pittsburgh Supercomputing Center Wins High-Performance Computing Award
Read more about PSC - Blacklight Goes to Work at the Pittsburgh Supercomputing Center
Read more about Blacklight - Ranger supercomputer's lifespan extended one year as part of NSF XD initiative.
Read more about Ranger - Kraken set to deliver 2 billionth CPU hour, sustains 96 percent utilization
Read more about Kraken - TACC Offers New, Broader Computational Biology Software Stack to Open Science Community.
Read more about biology software stack - ACM launches new Special Interest Group on High Performance Computing. Join by Nov. 18 for special rate.
Read more about the new SIGHPC - 'What Are You Working on Today,' Ranger, Jaguar and iForge?
Read more about TACC's Ranger supercomputer
Read more about ORNL's Jaguar supercomputer
Read more about NCSA's iForge supercomputer - Adventures with HPC Accelerators, GPUs and Intel MIC Coprocessors
Read more about experiences with new hardware - Developing Scientific Computing Communities
Read more about development efforts - Indiana University to create the National Center for Genome Analysis Support, which will be integrated with XSEDE resources
Read more about the NCGAS at IU - Scientists use XSEDE/TeraGrid resources to determine how shock waves move through solids
Read more about 'super-elastic shock waves' - XSEDE upgrades network
Read more about the XSEDE upgrade - Richard Tapia, Rice University mathematician and professor and member of XSEDE outreach team, receives National Medal of Science
Watch the Oct. 21 webcast
Read more about Tapia's award
Learn more about Richard Tapia - Stampede's comprehensive capabilities to bolster U.S. open science computational resources
Read more about Stampede
Watch a video of Jay Boisseau, director of TACC, discussing Stampede - SDSC announces scalable, high-performance data storage cloud
Read more about SDSC cloud - Appro and SDSC Gordon supercomputer to provide up to 35M IOPS
Read more about SDSC's Gordon - Dr. Barry Schneider from the National Science Foundation to describe XSEDE in the Oklahoma Supercomputing Symposium keynote, Oct. 11-12
Read more about Dr. Schneider's keynote
Go to symposium site - Students research solar cells with HPC
Read more about HPC and solar research - Seeing Is Believing: Extreme Digital visualization and data analysis resources help researchers derive insights from massive data sets
Read more about Extreme Digital - New "Memory Advantage Program" on Blacklight at the Pittsburgh Supercomputing Center
Read more about PSC's MAP - XSEDE project brings advanced cyberinfrastructure, digital services, and expertise to nation's scientists and engineers
Read more about XSEDE - Watch the John Towns video
- How XSEDE will facilitate collaborative science
Read more about XSEDE and collaboration