« Back

June 16 2015 ECSS Symposium

June 16, 2015

A Short Story of Efficiently Using Two Open-Source Applications on Stampede

Presenter(s): Ritu Arora (TACC)

Presentation Slides

This presentation will cover a summary of two challenges and solutions related to running the DROID (Digital Record Object Identification) and the FLASH astrophysics code on a large number of nodes on Stampede.
DROID is a software tool developed by The National Archives to perform automated batch identification of file formats. It is written in Java and works well when only one copy of it is run on a node. PI Jesscia Trelogan from the Institute of Classical Archaeology at UT Austin has been using DROID as part of her workflow for managing a large archaeological data collection. It would take her more than 2 days to extract metadata from about 4.3 TB of data using DROID on a local server. Since the process of culling and reorganizing the data collection is iterative, the metadata extraction using DROID needs to be done often. The goal of the ECSS project with PI Trelogan was to provide support in leveraging Stampede for parts of her workflow, which includes DROID, so that the overall time-taken in conducting all the steps in the workflow is reduced. The main challenge in using DROID on Stampede was related to executing its multiple copies in parallel on different nodes in a batch mode. An overview of this challenge and its solution strategy will be discussed during this presentation.
In another project, a copy of the FLASH astrophysics code was optimized such that the code does striped I/O on the Lustre File System. This project was proposed after it was found that a user overloaded the Lustre servers (which eventually became unresponsive) while running FLASH on 7000+ cores. The problem was related to the step that involved reading a checkpoint file. An overview of the problem and its solution will be included in this talk.

Optimization of Text Processing for the WordFlare Knowledge Graph

Presenter(s): Robert Sinkovits (SDSC)
Principal Investigator(s): Michael Douma (IDEA)

Presentation Slides

The goal of the WordFlare project is to create a tablet-based app to engage K-12 and lifelong learners in exploring language and knowledge. The app is based on a massive thesaurus and features dynamic visualizations of word relationships. Approximately 9% of the content is human-curated, while the other 91% is derived using computational methods executed on XSEDE resources. In this talk, I will describe the steps taken to accelerate two key steps in the automated text processing – optimization of the Latent Dirichlet Allocation (LDA) algorithm and the development of a fast method to simultaneously search for large numbers of words in a corpus. The speedups we obtain are highly problem dependent, ranging from 1.5-2.2x for the LDA algorithm and up to 1500x for the word search when using a large reference dictionary (e.g. the 400K words found in Wiktionary).