2017 Archive | Track 1- Data Storage & Management

2017 Archived Content

OVERVIEW | DOWNLOAD BROCHURE | SPEAKERS | WORKSHOPS

Track 1: Data Storage & Management

The unprecedented growth of data generation and research storage isn’t slowing down anytime soon. As such, storage is becoming a major cost element in the genomic IT world where organizations are spending millions on systems and platforms. The role of data engineering is critical in orchestrating, configuring, managing, and monitoring solutions to manage the data bloat problem. Track 1 assembles thought leaders and organizations from data centers and “centers of excellence” who have pioneered advances in large-scale data management, predictive analytics, and workflow automation. Presentations will focus on people, process and technology issues related to storage platforms, integration and migration plans, architectures, governance, and scalability.

Tuesday, May 23

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

(W7) Introduction to Hadoop for Bioinformatics

12:30 – 4:00 pm Afternoon Pre-Conference Workshops*

* Separate registration required.

2:00 – 6:00 Main Conference Registration Open

4:00 PLENARY KEYNOTE SESSION

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, May 24

7:00 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION

9:50 Coffee Break in the Exhibit Hall with Poster Viewing

DESIGNING DATA & STORAGE MANAGEMENT SOLUTIONS

10:50 Chairperson’s Remarks
Chris Dwan, Senior Technologist and Independent Consultant

11:00 IT Design Patterns to Support Genomic Science in the Age of the Cloud: Challenges and Possibilities

Chris Dwan, Senior Technologist and Independent Consultant

11:30 Design Considerations for Genomic Data Archival

Saira Kazmi, Ph.D., Scientific Data Architect, Research Information Technology, The Jackson Laboratory

This talk focuses on issues related to managing and tiering storage for genomic sequence data. Architectural considerations for designing a solution for scalability, governance, and discoverability will be presented. The presentation will discuss some of the current hardware and software technologies and a solution using metadata indexing will be presented. The presentation will conclude with lessons learned and next steps.

12:00 pm Life Sciences at EXAScale: Applying a Novel IO System to Critical Workflows

James Coomer, Technical Director, UK-Pre-Sales, Engineering, DDN Storage

Today, we generally assume that an IO interface and the filesystem choice are related and indeed this is usually true. We present an IOSystem based upon Flash which embeds within a filesystem, replacing the IO interface with one that removes common constraints – particularly for IO patterns seen in Life Sciences. DDN’s Infinite Memory Engine radically changes the way IO is handled, providing new opportunities for complex life science workflows.

12:15 Novel Systems and Approaches for the Next Generation of Genomic Analysis and Data Management

Christopher Davidson, Life Science Solutions Manager, HPE

The pace and scale of genomics research is now less defined by the science itself than by the compute and storage architectures used to extract insight from the genomic data generated. This session focuses on how to enable genomic workflow acceleration and the democratization of data through flexible and scalable systems.

12:30 Session Break

12:40 Luncheon Presentation I: Broad Institute & Intel GATK 4.0 Optimization Overview

Eric Banks, Director, Data Science and Data Engineering Group, Broad Institute

Geraldine Van der Auwera, Associate Director, Outreach and Communications, GATK, Broad Institute

Mark Bagley, Director, Center for Genomic Data Engineering, Intel

Paolo Narvaez, Senior Director, Engineering, Intel

Genomics research leader the Broad Institute of MIT and Harvard joins Intel to describe their collaboration to enhance the GATK environment and scale researchers’ ability to analyze massive amounts of genomic data from diverse sources worldwide. Topics include performance best practices and the latest on Genomics DB and FireCloud.

1:10 Luncheon Presentation II: The New Era of Integrated Data Storage

Mark Pastor, Director, Archive & Technical Workflow Solutions, Marketing, Quantum Corporation

Clinicians and researchers need high-performance and easy access to data — even as data repositories reach petabyte scale levels. Designing a future-proof, cost effective storage infrastructure to deliver performance access and protection for all your data, and can act as a gateway to all types of storage media including public cloud is now easier than ever before.

1:40 Session Break

1:50 Chairperson’s Remarks

Asya Shklyar, Senior Scientific Consultant Infrastructure, BioTeam

1:55 Tools and Techniques for Making Data Less Scary and More Visible

Asya Shklyar, Senior Scientific Consultant Infrastructure, BioTeam

Data management has been and is an ongoing and rapidly escalating problem in the research and commercial world. The talk aims to summarize the tools that can be leveraged to wrangle the data across multiple heterogenous sources and make it more quantifiable, searchable, parsable, and actionable, including software and hardware, open source, and commercial.

2:25 Reducing the Size and Cost of NGS Data Storage and Transfer

Dan Greenfield, Ph.D., CEO, PetaGene Ltd.*

PetaGene provides a software suite which allows up to 5x compression of NGS data and better utilizes your compute nodes. PetaGene's file system interface allows existing tools and pipelines to be used without modification, and this is provided as a free download so everyone everywhere can use the compressed files. * Bio-IT World 2016 Best of Show winner

2:55 Pushing the Limits of Discovery with Internet2

Dan Taylor, Director, Business Development, Network Services, Internet2

3:10 Time to Results Matters: The Case for Performance Scale-Out NAS

David Sallak, Vice President, Product Management & Industry Marketing, Panasas, Inc.

Acceleration. Decoding human genomes has gone from decades to mere days and hours. Your research infrastructure must keep ahead of the demands placed on it by the best scientists. Building the right storage foundation positions your organization for groundbreaking research insights. Learn how the Panasas accelerated scale-out NAS solution helped the Garvan Institute drive innovation faster and simplified their workflows.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

4:00 Evaluating Full Flash Scale-Out NAS Technologies for Some Bioinformatics Workloads

Youssef Ghorbal, Design and Technical Solutions Group Manager, Institut Pasteur

Scale-out NAS technology is an effective backend for bioinformatics workloads at Institut Pasteur. In our presentation, we will go through the shortcomings of the current setup for some identified use cases. We will also be giving feedback on how newly tested flash array technologies may overcome those limitations.

4:30 Integrating Data, Tools and Infrastructure to Enable Efficient Collaboration and Management in Large Scale Biomedical Studies

Sven Nahnsen, Ph.D., Head, Quantitative Biology Center (QBiC), University of Tübingen

We established an infrastructure that builds on multi-layer omics (genomics, transcriptomics and metabolomics), as well as imaging data from mice and from human material that is gained from clinical oncology studies. Furthermore, we developed a data and project management facility that facilitates the modeling of the experimental design, the interplay with the data acquisition facilities and the bioinformatics analysis.

5:00 Extreme Durability for Your Bioinformatics Data

Kent Ritchie, Solutions Architect, HGST, a Western Digital brand

HGST, a Western Digital brand, one of the largest storage companies in the world, can help with every stage of your workflow. Come visit us to hear about extreme durability for your bioinformatics data, focusing on high performance computing to archiving and analytics for your long-term discoveries. We are here to help you deliver possibilities at every stage.

5:30 – 6:30 15th Anniversary Celebration in the Exhibit Hall with Poster Viewing and Best of Show Awards

Thursday, May 25

7:00 am Registration Open

8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM

8:05 Benjamin Franklin Awards and Laureate Presentation

8:35 Best Practices Awards Program

8:50 Plenary Keynote

9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

HADOOP

10:30 Chairperson’s Remarks
Martin R. Gollery, CEO, Tahoe Informatics

10:40 Real World Data Platform and Analytics

Minnie Chou, Director, Information Systems, Amgen

The Real World Data (RWD) Platform is a game changer in Amgen's pursuit of serving patients by delivering innovative human therapeutic products faster. It provides a common high performance analytics ecosystem hosting large volumes of real world patient claims data and electronic medical records, enabling epidemiologists, analysts and scientists to deliver insights in a timely and cost effective manner. We used Agile approach to implement an enterprise RWD workbench on top of Hadoop based enterprise data lake to harmonize real world patient data assets, patient cohorts with diseases and/or receiving Amgen/competitor therapies to consistently address questions across the drug commercialization lifecycle.

11:10 Healthcare: Foundational Building Blocks: The Establishment of a Healthcare Data Ecosystem in a Hadoop Environment

Amy M. Andrade, MS, PMP, Assistant Vice President of Research, Meharry Medical College

Charles Boicey, MS, RN-BC, Associate Clinical Professor, Stony Brook Medicine

This talks presents insight and a working framework of how data and storage management, and clinical informatics in a Hadoop environment is plausible in months instead of years. A Data Science Center at a community-based academic health center focused on serving the underserved and minority populations has implemented a low cost HIPAA compliant cloud approach to “Big Data”. Utilizing technologies new to healthcare, data from both within and outside of the healthcare environment was processed.

11:40 Kickstarting Breakthroughs in Life Sciences with Intelligent, Next-Generation Scale-Out Storage

Peter Godman, CTO & Co-Founder, Qumulo

Unprecedented storage and data management challenges resulting from advances in genomic IT are plaguing life sciences companies. How can companies stay competitive and handle the challenge of managing billions of small and large files? Discover how intelligent scale-out storage systems are providing enterprises with real-time answers into their data footprints at scale, providing breakthrough performance while balancing capacity and cost.

11:55 IBM Cloud Object Storage Solutions Enabling Better Patient Outcomes
Piers Nash, Ph.D., Global Solutions Consultant, Genomics & Healthcare, IBM Cloud Object Storage, IBM
IBM and University of Chicago’s Center for Data Intensive Science (CDIS) are accelerating medical discoveries. Utilizing IBM Cloud Object Storage, CDIS centrally stores and manages vast amounts of genomic and clinical data at web-scale. Discover how IBM Watson for Genomics can help researchers to collaborate via shared access to harmonized data sets, speeding discovery and enabling precision medicine.

12:10 pm Session Break

12:20 Luncheon Presentation I: AI + RWD – Next Steps in your Big Data Journey

Arun Ghosh, Principal Data & Analytics, KPMG LLP

Organizations' data repositories have evolved to accommodate real-world data. This moves the industry incrementally closer to the automated data management research scientists and epidemiologists need to accelerate improved patient outcomes. This presentation will illuminate how machine learning (as a service) and artificial intelligence can use data to accelerate value in small measures today while laying the foundation for what's next.

12:50 Luncheon Presentation II: Optimized Scaling for NGS: Transfer, Storage and Archiving
Rafael Feitelberg, CEO, Geneformics Inc.
As the volume of NGS data is increasing, so are the challenges of IT costs and infrastructure. In this session, we will cover solutions implemented by leading global organizations to reduce NGS footprint by up to 90%, with scalable Enterprise-Grade architectures that are lossless and transparent to bioinformatics applications.

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

FEATURED SESSION: BIOTEAM MICRO-SYMPOSIUM: 2017 BIO-IT TRENDS

1:55 Chairperson’s Remarks

Chris Dwan, Senior Technologist and Independent Consultant

2:00 BioTeam Micro-Symposium: 2017 Bio-IT Trends

Chris Dwan, Senior Technologist and Independent Consultant (Moderator)

Ari E. Berman, Ph.D., Vice President and General Manager of Consulting Services, BioTeam, Inc.

Chris Dagdigian, Founding Partner & Director, Technology, BioTeam, Inc.

Aaron Gardner, Senior Scientific Consultant, BioTeam, Inc.

Adam Kraut, Director of Infrastructure and Cloud Architecture, BioTeam, Inc.

Asya Shklyar, Senior Scientific Consultant, Infrastructure, BioTeam, Inc.

Since 2010, the “Trends in the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk was to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation tried to recap the prior year by discussing what has changed (or not) around infrastructure, storage, computing, and networks. This presentation has helped scientists, leadership, and IT professionals understand the basic topics involved in supporting data intensive science. For 2017, the “Trends in the Trenches” presentation will evolve and expand from 60-minutes to 120-minutes and feature more content, speakers, and interactive discussion. Short focused podium talks on current trends related to computing, storage/data transfer, networks, and cloud will be presented. A Q&A moderated discussion follows. Come prepared with your questions and commentary for this informative and lively session.

4:00 Conference Adjourns

Conference Tracks

T1: Data Platforms & Storage Infrastructure