ECSS Symposium
Extended Collaborative Support program provides expert assistance in a wide range of
cyberinfastructure technologies. Any user may request this assistance through the XSEDE allocation process.
The primary goal of this monthly symposium is to allow the over 70 staff members working in ECSS to exchange information about successful techniques used to address challenging science problems. Tutorials on new technologies may also be featured. Two 30-minute, technically-focused talks will be presented each month and will include a brief question and answer period. This series is open to all.
Symposium coordinates:
Third Tuesday of the month, pending scheduling conflicts. Upcoming events will be listed on the website and announcements posted to the News Training category.
1pm Eastern/12pm Central/10am Pacific
These sessions will be recorded. For this large webinar, only the presenters and host will be broadcasting audio. Attendees may submit questions to the presenters through a moderator by sending a chat message.
Videos and slides from past presentations are available below.
(Videos not listed below, from prior years, can be found here: 2012, 2013, 2014, 2015)
May 17, 2022
Reworking inefficient workflows for shared HPC resources
Presenter(s): Mitchell Dorrell (Pittsburgh Supercomputing Center)
Sometimes major scientific advances arrive as beautifully packaged open source software implementations, ready to be used on any computing system in the world. Sometimes they don't. The world of protein folding has been changed by the arrival of new AI-centric algorithms that use databases of known sequence-to-structure relationships to predict previously-unknown structures with unprecedented accuracy. One of the most accomplished such algorithms is DeepMind's AlphaFold2, which is now publicly available under an open source license. As a service provider, we sought to install AlphaFold2 on our systems to make it more accessible to our users. In the process, we discovered that the workflow that DeepMind ships with AlphaFold2 is extremely inefficient when used on typical HPC resources. In this discussion, I will explain the approaches we are taking to enable our users to run AlphaFold2 as easily, but also efficiently, as possible.
A Historical Big Data Analysis To Study The Social Construction Of Juvenile Delinquency - Latest Progress
Presenter(s): Yu Zhang (CSU Fresno) Sandeep Puthanveetil Satheesan (NCSA) Bhavya (University of Illinois Urbana-Champaign) Adam Davies (University of Illinois Urbana-Champaign)
Social construction is a theoretical position that social reality is created through the human's definition and interaction. As one type of social reality, juvenile delinquency is perceived as part of social problems, deeply contextualized and socially constructed in American society. The social construction of juvenile delinquency started far earlier than the first juvenile court in 1899 in the US. Scholars have tried traditional historical analysis to explore the timeline of the social construction of juvenile delinquency in the past, but it is inefficient to examine hundred years of documents using traditional paper-and-pencil methods. Our project combines "big data" image and text analysis modules, using these tools to analyze hundreds of years of scanned newspaper images to better understand the historical social construction of juvenile delinquency in American society. This ECSS Symposium will provide an update of progress on this project since our last symposium. In the prior symposium we focused on the issues involved in OCR and in segmentation of historical newspaper collections. We have since made great progress in this area and have also added additional newspaper collections. We will provide an update on the OCR and segmentation issues, but primarily address the analyses of the resultant text data. We have applied a number of text analysis techniques including topic modeling, lexical analysis, and human-in-the-loop document classification.
April 19, 2022
A case study on deep learning for classification with imbalanced finance data
Presenter(s): Paul Rodriguez (San Diego Supercomputer Center)
Deep learning neural networks have become very important in machine learning and artificial intelligence applications but it is not so obvious how much neural networks will improve classification performance in applications with tabular data or sequential data. In this study we compare neural network performance to several standard machine learning models for classification with an imbalanced data sets with low rate of positive cases. We explore several neural network architecture options and consider methods and trade-offs in searching through hyperparameter space, as well as sampling or loss-weighting options. We find that although neural networks have robust and interesting performance, more deep layers do not show a big improvement in this data set, and shallow networks or other models are competitive.
Reworking inefficient workflows for shared HPC resources (Rescheduled for May 17)
Presenter(s): Mitchell Dorrell (Pittsburgh Supercomputing Center)
Sometimes major scientific advances arrive as beautifully packaged open source software implementations, ready to be used on any computing system in the world. Sometimes they don't. The world of protein folding has been changed by the arrival of new AI-centric algorithms that use databases of known sequence-to-structure relationships to predict previously-unknown structures with unprecedented accuracy. One of the most accomplished such algorithms is DeepMind's AlphaFold2, which is now publicly available under an open source license. As a service provider, we sought to install AlphaFold2 on our systems to make it more accessible to our users. In the process, we discovered that the workflow that DeepMind ships with AlphaFold2 is extremely inefficient when used on typical HPC resources. In this discussion, I will explain the approaches we are taking to enable our users to run AlphaFold2 as easily, but also efficiently, as possible.
March 15, 2022
Supporting HPC Research and Education With Open OnDemand
Presenter(s): Richard Lawrence (Texas A&M University)
Researchers using HPC resources face a steep learning curve when faced with new tools, technologies, and languages. This barrier to entry slows adoption of HPC best practices. A robust system of graphical, interactive user interfaces lowers the barrier. The Open OnDemand framework enables HPC sites to provide web-based graphical user interfaces. We present here some improvements that are possible in the OOD framework developed and deployed at TAMU and argue for the necessity of these and additional developments. The focus is on practical utility for researchers and easy maintenance for administrators.
Anvil - A National Composable Advanced Computational Resource for the Future of Science and Engineering
Presenter(s): Rajesh Kalyanam (Purdue University)
Anvil is a new XSEDE advanced capacity computational resource funded by NSF. Designed to meet the ever increasing and diversifying needs for advanced computational capacity, Anvil integrates a large capacity HPC system with a comprehensive ecosystem of software, access interfaces, programming environments, and composable services. Comprising a 1000-node CPU cluster featuring the latest AMD EPYC 3rd generation (Milan) processors, along with a set of 1TB large memory and NVIDIA A100 GPU nodes, Anvil integrates a multi-tier storage system, a Kubernetes composable subsystem, and a pathway to Azure commercial cloud to support a variety of workflows and storage needs. Anvil entered production in February 2022 and will serve the nation's science and engineering research community for five years. We will describe the Anvil system, its user-facing interfaces, and services, and share data and feedback from the recently concluded early user access program.
December 21, 2021
TaRget Enablement to Accelerate Therapy Development for Alzheimer's Disease (TREAT-AD)
Presenter(s): Rob Quick (Indiana University)
The National Institute on Aging describes Alzheimer's Disease (AD) as "a brain disorder that slowly destroys memory and thinking skills, and, eventually, the ability to carry out the simplest tasks." It ranks as the 6th leading cause of death in the US. The TaRget Enablement to Accelerate Therapy Development for Alzheimer's Disease (TREAT-AD) is a joint effort leveraging drug discovery expertise from the Indiana University School of Medicine (IUSoM), Purdue University, Emory University, and Sage Bionetworks. The goal of these NIH funded projects is to improve, diversify, and invigorate the Alzheimer's disease drug discovery pipeline. The IUSoM is responsible for the Bioinformatics and Computational Biology Core (BCBCore) and will be the focus of this symposium. The BCBCore (bcbportal.medicine.iu.edu) is implemented as a series of developmental science gateways that will be consolidated into a single production portal for AD Tools and Data. We will discuss the motivation and goals of the overarching project, demo an important AD research tool under development (AD Explorer), and discuss other various aspects of engaging this important research group as an XSEDE ECSS collaborator.
October 19, 2021
Campus Champions Short Presentations
Presenter(s): Suxia Cu (Prairie View A&M University) Kurt Showmaker (University of Mississippi Medical Center,) Zhiyong Zhang (Stanford University) Sinclair Im (Yale University)
Presentation Slides Image analysis for digital surrogates
Presentation Slides A density functional theory study
Presentation Slides Optimal utilization of XSEDE resources
The October Symposium will feature a series of short presentations (≤ 15 minutes) by four of the XSEDE 2020-21 Campus Champion Fellows. Speakers and titles are listed below, with additional details for their projects available on the 2020-21 announcements page.
Suxia Cui, Prairie View A&M University, Image analysis for digital surrogates of historical motion picture film
Kurt Showmaker, University of Mississippi Medical Center, A Comprehensive Annotator and Web Viewer for scRNA-seq Data
Zhiyong Zhang, Stanford University, Optimal Utilization of XSEDE Computing Resources for the NWChem Computational Chemistry Software Package
Sinclair Im, Yale University, A density functional theory study: quantum materials
September 21, 2021
InterACTWEL Cyberinfrastructure: Enabling Long-term AI-driven Decision Support for Adaptive Management of Water, Energy, and Land Resources in Watershed Communities
Presenter(s): Meghna Babbar-Sebens (Oregon State University) Samuel Rivera (Oregon State University) Eroma Abeysinghe (Indiana University) Eric Coulter (Indiana University)
Cyberinfrastructure serves as backbone to many of the complex and data-intensive computational analyses typically conducted for climate change impact assessment and decision support. However, cyberinfrastructure research in the recent past has been primarily focused on supporting the needs of researchers via support of networking, storage, standards, middleware, and computation capabilities. Adaptation to climate change in watershed communities will require long-term interactions with stakeholders for coordination of context-sensitive decisions, as conditions evolve over time. This means that application of Artificial Intelligence (AI) in adaptation research will need to consider the social-physical and dynamic nature of climate-change resilience problems in order to be decision-relevant. This has necessitated a broader vision for the role of AI-ready cyberinfrastructure in supporting multi-years stakeholder engagement for climate-change resilience. In this presentation, we present a novel, use-inspired, cyberinfrastructure InterACTWEL, which is being created to support longitudinal collaboration between researchers and decision makers on stakeholder-driven planning of adaptation to climate-change impacts in local watershed communities. We demonstrate how InterACTWEL cyberinfrastructure is being used to support AI-assisted and stakeholder-driven planning of water supply resilience in a testbed local community within the Columbia River Basin.
May 18, 2021
COVID-19 Drug Repurposing Guidance using Fragment Molecular Orbital (FMO) Calculations
Presenter(s): Aaron Frank (University of Michigan) Dimuthu Wannipurage (Indiana University Pervasive Technology Institute) Suresh Marru (Indiana University Pervasive Technology Institute)
Presentation Slides Dimuthu Upeksha Wannipurage Slides
Presentation Slides Aaron Frank Slides
Presentation Slides Suresh Marru Slides
In this talk, we share our experiences and updates of a COVID-19 HPC Consortium project (https://covid19-hpc-consortium.org/projects/5eb5c8784c0571007b307650). Motivated by the need to rapidly identify drugs that are likely to bind to targets implicated in SARS-CoV-2, the virus that causes COVID-19, we present a framework for Fragment Molecular Orbital (FMO) calculations to speed up quantum mechanical calculations that can be used to explore structure-energy relationships in large and complex biomolecular systems. These calculations are still onerous, especially when applied to large sets of molecules.. We will share our XSEDE ECSS collaboration in assisting with cyberinfrastructure aspects, mechanisms and user interfaces that manage job submissions, data retrieval, and data storage for the FMO calculations. The talk will summarize how we used the Apache Airavata science gateway platform to apply FMO calculations to complexes formed between SARS-CoV-2 Mpro (the main protease in SARS-CoV-2) and 2820 approved and experimental drugs in a drug-repurposing library. The talk will highlight Airavata's job submission and monitoring enhancements to support static and parallel parameter sweeping capability on remote compute clusters across a batch of input data. We will discuss integration of a data parsing workflow to capture, filter out, and validate the enriched metadata from the outputs. Finally, we will discuss generalization of the extensions made in support of large-scale FMO calculations for SARS-CoV-2 Mpro-drug complexes and potential use in other biomolecular systems.
April 20, 2021
Leveraging Augmented Reality to Enhance Remote Collaboration
Presenter(s): Max Collins (UC Irvine)
Augmented Reality (AR) is a medium that gives people the ability to engage with digital information in ways that deviate from more traditional HCI methods (e.g. WIMP user interfaces). Remote work experiences leveraging teleconferencing are becoming increasingly prevalent as many are working together remotely in higher frequencies. In this talk we cover our investigation into the ways that AR can support efforts to work together across distance, and how invoking AR may create a sense of joint focus and engagement beyond what traditional remote collaboration tools afford. We outline the design process of an AR add-on to teleconferencing tools (e.g. Zoom) that allows participants to interact with one another around digital assets in AR, and share objects with one another through the screen. We investigate the use cases of this tool and describe the evaluation methods and preliminary user testing results of this system.
Best Practices for Research Software Engineers
Presenter(s): Rudi Eigenmann (University of Delaware)
The Xpert Network brings together teams and individuals that support domain scientists in developing, optimizing and running computational and data-intensive applications. One goal is to develop best practices. In this talk I will summarize initial results. They include both software engineering advice and recommendations for team organization and collaboration. The results also include experiences with tools that can accelerate the work of research software engineers. A particular emphasis will be on practices that differ from those applicable in a general software engineering context.
March 16, 2021
HPC for epidemic modeling with limited data: COVID-19 case studies
Presenter(s): Kelly Pierce (TACC)
The novel coronavirus (SARS-CoV-2) emerged in late 2019 and spread globally in early 2020. Initial reports suggested the associated disease, COVID-19, produced rapid epidemic growth and caused high mortality. As the virus sparked local epidemics in new communities, health systems and policy makers were forced to make decisions with limited information about the spread of the disease. The UT COVID-19 Modeling Consortium formed in response to the urgent need for increased situational awareness and developed a library of COVID-19 models to project infections and healthcare burdens. These models were used to inform policy decisions in the city of Austin, Texas and as part of the CDC COVID-19 mortality and infection model ensembles. Now one year into the pandemic, the Consortium has expanded the scope of its work to include estimates of infection introductions in schools, statistically informed guidelines for genomic surveillance to detect novel variants, and equitable vaccine distribution. As an early partner in the Consortium, the Texas Advanced Computing Center (TACC) has provided support in software development, data management, and long-term modeling infrastructure development. This talk will overview the joint work of the Consortium and TACC, with an emphasis on the impact of limited data availability in epidemiological modeling and the role of high-performance computing in supporting fast turn-around of time-sensitive results.
February 16, 2021
MuST – A high performance computing software package for the ab initio study of materials
Presenter(s): Yang Wang (Pittsburgh Supercomputing Center)
Ab initio calculation is one of the most popular computational practices in the HPC user community. It aims to study molecules or materials using quantum mechanics as its fundamental principle, rather than being based upon empirical or semi-empirical models. In the past decade, several computational tools developed for ab initio calculation have become available to the research community. In this presentation, I will introduce MuST, an open source software project supported by NSF CSSI program. MuST package is designed for enabling ab initio investigation of disordered materials. It is developed based on multiple scattering theory with Green function approach in the framework of density functional theory, and is built upon decades of development of research codes that include 1) KKR method, which is an all-electron, full-potential, ab initio electronic structure calculation method; 2) KKR-CPA method, which is a highly efficient ab initio method for the study of random alloys, and 3) Locally Self-consistent Multiple Scattering (LSMS) method, which is a linear scaling ab initio code capable of treating extremely large disordered systems from the first principles using the largest parallel supercomputers available. Strong disorder and localization effects can also be studied in real system within the LSMS formalism with cluster embedding in an effective medium, e.g., DMFT, DCA, or TMDCA, enabling a scalable approach for the ab initio studies of quantum materials. I will show the latest development of the MuST project, and discuss its potential applications.