Track 1- Data Storage and Transport

2020 Archived Content

Is the burden of managing your data growing larger every day? Do you have a scalable and robust data management infrastructure in place to store, process, analyze, and transfer vast quantities of data according to your organization’s policies? Is your organization using new tools and analytical processes such as AI and deep learning that stress your supporting IT infrastructure beyond the expectations of system designers? Managing data has become a prevalent issue in the life sciences industry. Organizations are spending millions on systems and platforms to manage, store, and transfer many types of data (e.g., experimental, operational, clinical) from many different disparate sources. The role of data engineering is critical in orchestrating, configuring, managing, and scaling solutions to manage the data bloat problem. The Data Storage and Transport track presents in-depth case studies from leading life science organizations who are implementing solutions to address data storage and transfer problems and challenges. These include where to store data (cloud, local, mixture), what is the optimal configuration regarding price vs. access, estimating data storage costs and making financial models, understanding and planning for costs in the cloud, what to do with large third-party databases (inter-pharma collaborations, genomic/expression datasets), what to do with imaging collaboration that produces 100 TB, "rehydrating" a data archive (from tape) for re-analysis, determining if you're storing the right stuff, figuring out the best way to deliver data products to customers/collaborators, and more. How are you developing technologies to deal with influx of digital data from digital health devices?

Tuesday, October 6

10:00 am

Welcome Remarks

Cindy Crowninshield, Executive Event Director, Cambridge Healthtech Institute

10:05 amTalk Title to be Announced

Scott Parker, Director of Product Marketing, Marketing, Sinequa

10:15 am

NIH’s Strategic Vision for Data Science

Susan K. Gregurick, PhD, Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health

Rebecca Baker, PhD, Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health

11:05 am

LIVE Q&A: Session Wrap-Up Panel Discussion

Panel Moderator:

Ari E Berman, PhD, CEO, BioTeam Inc

11:25 am Lunch Break - View Our Virtual Exhibit Hall

11:55 am Recommended Pre-Conference Workshops*

W1: Data Management for Biologics: Registration and Beyond

W2: A Crash Course in AI: 0-60 in Three

W3: Data Science Driving Better Informed Decisions

*Separate registration required. See workshop page for details.

1:55 pm Refresh Break - View Our Virtual Exhibit Hall

2:15 pm Recommended Pre-Conference Workshops*

W4: Digital Biomarkers and Wearables in Pharma R&D and Clinical Trials

W5: AI-Celerating R&D: Foundational Approaches to How Emerging Technologies Can Create Value

W6: Dealing with Instrument Data at Scale: Challenges and Solutions

*Separate registration required. See workshop page for details.

4:15 pm Close of Day

Wednesday, October 7

9:00 am

Beyond Discoverability: Metadata to Drive Your Data Management

Terrell Russell, PhD, Chief Technologist, The iRODS Consortium at Renaissance Computing Institute (RENCI)

As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories. With an open, policy-based platform, metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance. Data management should be data centric, and metadata driven.

9:20 am

Building a Foundation for a Data Commons at NIEHS

Michael C. Conway, Technical Architect, Office of Data Science, NIH NIEHS

The topic of an NIH Data Commons has been an area of great interest and activity, as has the general FAIR data movement. These broad notions are playing out with a future focus while NIEHS works to build its own Data Commons to manage today’s research data. Managing daily work while observing future trends, incorporating key capabilities, often in a tentative and piecemeal fashion, without losing sight of the big picture; this is the challenge we all face.

9:40 am

Metadata: Getting to Know Your Data

Oleg Moiseyenko, Associate Director, Scientific Computing Systems, Bristol Myers Squibb

Next-generation sequencing (NGS) is routinely being used in cancer research. This produces large amounts of data during data collection and as it gets processed through the pipeline. Tracking and apply context to large amount of data becomes a challenge. Using AWS as a means of data delivery and processing, has the advantage of automating data delivery and processing. In addition, applying context to the data using iRODS can be automated as data gets delivered and processed. This will provide a primary source of metadata that can be used by other applications downstream.

10:00 am Coffee Break - View Our Virtual Exhibit Hall

10:20 am

Pathways in the Cloud: Facilitating Storage for Analysis Pipelines

Brigitte E. Raumann, Product Manager, Globus, University of Chicago

Life sciences researchers must contend with data spread across a wide variety of storage system types, including on-premises and cloud storage. In order to efficiently and reproducibly execute data analysis pipelines, researchers need secure and sophisticated data management capabilities that provide unified data access irrespective of storage location and type. In this talk, we will discuss how Globus provides a solution to this challenge, including a success story of a popular data analysis system built on the Globus framework.

10:40 am Session Break

11:00 amAt-scale Genomic Data Compression, Storage, and Access Using PetaGene on AWS: Reference Architecture

Vaughan Wittorff, PhD, Co-Founder, PetaGene

Lisa McFerrin, PhD, Bioinformatician, AWS

AWS has developed a Reference Architecture for compression and read back using PetaGene’s PetaSuite Cloud Edition and deployed it with mutual client AstraZeneca. By doing so, AstraZeneca has reduced their storage footprint, sped up data ingress and egress, and accelerated analysis, while maintaining the data in a fully-accessible format.

11:15 amData Accessibility: One company's transformation to the cloud saves 64% in storage costs

Brian Woznik, Solutions Architect, Engineering, Igneous

Learn how Igneous is helping a leading research institute create efficiencies in data management that are reducing storage costs by over 64% and migrating 4PB of data to the cloud in just weeks, all while maintaining the much-needed visibility which allows end-users to make informed decisions.

11:30 am LIVE Q&A:

Session Wrap-Up Panel Discussion

Panel Moderator:

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

Panelists:

Michael C. Conway, Technical Architect, Office of Data Science, NIH NIEHS

Lisa McFerrin, PhD, Bioinformatician, AWS

Oleg Moiseyenko, Associate Director, Scientific Computing Systems, Bristol Myers Squibb

Brigitte E. Raumann, Product Manager, Globus, University of Chicago

Terrell Russell, PhD, Chief Technologist, The iRODS Consortium at Renaissance Computing Institute (RENCI)

Vaughan Wittorff, PhD, Co-Founder, PetaGene

Brian Woznik, Solutions Architect, Engineering, Igneous

11:50 am Lunch Break - View Our Virtual Exhibit Hall

11:55 am Interactive Breakout Discussions

During the break, consider joining a breakout discussion group. These are informal, moderated discussions with brainstorming and interactive problem solving, allowing participants from diverse backgrounds to exchange ideas and experiences and develop future collaborations around a focused topic.

BREAKOUT: Early Adopter or Fashionably Late: Talking Cloud Evolution with Takeda, Element Biosciences & Celgene

Michael Riener, President, RCH Solutions

Join us for a lively discussion among prominent pharma leaders, and learn:

Why, when & how to implement a public Cloud for your computing needs

Challenges and opportunities when setting and managing stakeholder expectations

Critical keys to success to realize the best outcomes

To learn more about RCH Solutions, visit our Virtual Booth

BREAKOUT: Why Current Approaches Using AI in Drug Discovery Fail: How Can We Overcome?

Joe Donahue, Managing Director, Life Sciences, Accenture

Hosted by Joe Donahue, Managing Director, Life Sciences, Accenture

Participants include:

Andreas Matern, Head of Digital Translational Medicine, Sanofi

John Quackenbush, Professor of Computational Biology and Bioinformatics; Harvard T.H. Chan School of Public Health

Seungtaek Lee, VP, Strategic Partnerships and AI RWE Head of CoE; ConcertAI

Preston Keller, PhD, MBA, President & CCO, PercayAI

Philip Payne, PhD, Becker Professor and Chief Data Scientist, Washington University in St. Louis

BREAKOUT: Discovering Insights Across Clinical Trials and Real World Data

Jeff Evernham, VP of Customer Solutions, Consulting, Sinequa

Most large scale analysis of clinical trial data only leverages part of the picture, ignoring unstructured data and limiting findability across all the information collected throughout multiple disparate data sources. This roundtable will discuss leveraging a cognitive platform to combine all data from multiple sources into one unified view using a single entry point to the data.

BREAKOUT: Opportunities and Trade-offs of Benchmarking NGS Tools & Applications

Sasha Paegle, Life Science Business Development, Dell Technologies

Evaluating, optimizing and benchmarking of next generation sequencing (NGS) methods are essential for clinical, commercial and academic NGS pipelines. Optimizations for speed and accuracy often require making trade-offs relative to other constraints. Join this roundtable to discuss benchmarking strategies, trade-offs, and the value of benchmarking genomics tools and applications.

12:20 pmKeynote Introduction: Advancing the New R&D Paradigm for Life Sciences

Michael Schwartz, Head, Product Marketing, Marketing, Benchling

The life science industry has forged ahead with a new generation of therapeutics. A new R&D paradigm is required to develop scientific platforms, manage data complexity, and orchestrate progress across specialized teams. Digital solutions and data ecosystems are at the heart of this, but require both structure and adaptability to thrive in the modern life science R&D environment.

12:30 pm KEYNOTE PRESENTATION & PANEL DISCUSSION:

Game On: How AI, Citizen Science, and Human Computation Are Facilitating the Next Leap Forward

Allison Proffitt, Editorial Director, Bio-IT World

While the precision medicine movement augurs for better outcomes through targeted prevention and intervention, those ambitions entail a bold new set of data challenges. Various panomic and traditional data streams must be integrated if we are to develop a comprehensive basis for individualized care. However, deriving actionable information requires complex predictive models that depend on the acquisition and integration of patient data on a massive scale. This picture is further complicated by new data streams emerging from quantified self-tracking and health social networks, both of which are driven by experimentation-feedback loops. Tackling these issues may seem insurmountable, but recent advancements in human/AI partnerships and crowdsourcing science adds a new set of capabilities to our analytic toolkit. This session describes recent work in online collective systems that combine human and machine-based information processing to solve biomedical data problems that have been otherwise intractable, and an information processing ecosystem emerging from this work that could transform the landscape of precision medicine for all stakeholders. Pietro will open with a framing talk, followed by short presentations from each panelist, ending with a moderated Q&A discussion by Allison with speakers and attendees.

Panelists:

Seth Cooper, PhD, Assistant Professor, Khoury College of Computer Sciences, Northeastern University

Lee Lancashire, PhD, CIO, Cohen Veterans Bioscience

Pietro Michelucci, PhD, Director, Human Computation Institute

Jérôme Waldispühl, PhD, Associate Professor, School of Computer Science, McGill University

1:55 pm Refresh Break - View Our Virtual Exhibit Hall

3:10 pm

Aspects of Performance in Data Movement and Management

Vas Vasiliadis, Chief Customer Officer, Globus, University of Chicago

When it comes to moving and managing life science and bioinformatics data, performance is influenced by a multitude of factors. Researchers want their data management solutions to “just work,” without concern for what tend to be esoteric technical details. But delivering tools that simplify a researcher’s life requires a tradeoff between performance, reliability, usability, security, compliance, and technical complexity. In this talk we will explore how service providers should evaluate and prioritize these and other factors, and illustrate how the Globus service strikes an optimal balance, making it the de facto data management solution for over 100,000 researchers around the world. We will present case study examples, and attendees will be encouraged to share their own experiences.

3:30 pmProject Triumph, A Multi-cloud Architecture for Life Science Technical Computing

Matt Wallace, Chief Technology Officer, Faction

Sasha Paegle, Senior Business Development Manager, Life Scince, Dell Technologies

Dell and Faction share insights and results from a multi-cloud architecture that tests the NVIDIA Clara Parabricks application across multiple CSPs with the latest 1000 Genomes high coverage data on a cloud-adjacent Dell PowerScale appliance.

4:00 pm LIVE Q&A:

Session Wrap-Up Panel Discussion

Panel Moderator:

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

Panelists:

Vas Vasiliadis, Chief Customer Officer, Globus, University of Chicago

Matt Wallace, Chief Technology Officer, Faction

Sasha Paegle, Sr. Business Development Manager, Life Science, Dell Technologies

4:20 pm Bio-IT Connects - View Our Virtual Exhibit Hall

5:00 pm Close of Day

Thursday, October 8

9:00 am KEYNOTE PRESENTATION & PANEL DISCUSSION:

Trends from the Trenches

Kevin Davies, PhD, Executive Editor, The CRISPR Journal; Founding Editor, Bio-IT World

The “Trends from the Trenches” will celebrate its 10th Anniversary at Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data-intensive science. In 2020, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session. To stay connected with Trends from the Trenches updates after today and all year, sign up for BioTeam's newsletter here: https://bit.ly/33uO0OY

Panelists:

Vivien R. Bonazzi, PhD, Managing Director & Chief Biomedical Data Scientist, Deloitte Consulting LLP

Tim Cutts, PhD, Head of Scientific Computing, Wellcome Sanger Institute

Chris Dagdigian, Senior Director, BioTeam Inc.

Kjiersten Fagnan, PhD, CIO, Data Science & Informatics, Lawrence Berkeley National Laboratory

Matthew Trunnell, Data Commoner-at-Large; Executive Director, Pandemic Response Commons; Former Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center

10:40 amGenomics Analysis at Cloud Speed

Paul Speciale, Chief Product Officer, Product Management, Scality

Biotechnology companies are facing new challenges in the amount of data that needs processing for genomics analysis. What used to be Terabytes of data is now petabytes of data and beyond. This data needs to be collected, analyzed, processed and then ultimately retained for compliance and research purposes - resulting in massive data storage and management challenges, unsolvable by legacy technology solutions. Our session will explain how to leverage new all-flash storage and hybrid-cloud solutions to make genomics analysis run quantum leaps faster than before.

10:55 am Session Break

11:30 am Lunch Break - View Our Virtual Exhibit Hall

11:35 am Interactive Breakout Discussions

Consider joining a breakout discussion group. These are informal, moderated discussions with brainstorming and interactive problem solving, allowing participants from diverse backgrounds to exchange ideas and experiences and develop future collaborations around a focused topic.

BREAKOUT: Driving Scientific Discovery with Data / Digitization

Timothy Gardner, CEO, Riffyn, Inc.

How do you use data / digitization today to drive scientific discovery / product development?

What are you greatest scientific pain points / gaps that are not being met by digitization?

What kinds of outcomes do you believe digital tools could help you achieve?

ROUNDTABLE: Managing the Growing Demand for HPC Resources in Life Sciences Research

Scott Jeschonek, Principal Program Manager, Microsoft Azure

Welcome to this discussion group on the growth of demand for HPC in scientific research. We are looking forward to a lively forum. We'll start by looking at three related topics:

- What events trigger demand in your organization? How has the current pandemic impacted resources?

- What could make scale and collaboration more accessible to more researchers?

- Share a recent experience of shifting workloads to manage HPC capacity.

BREAKOUT: Research & Genomics Analysis at Multi-Cloud Speed

Greg DiFraia, General Manager, Americas, Executive Team, Scality

Shailesh Manjrekar, Head of AI and Strategic Alliances, Executive Team, WekaIO

In this session we’ll discuss how to provide researchers with performance and scale in genomics & research analytics, to drive results at a price point that’s economically viable on public & private cloud.

11:35 am

Breakout: NGS Pipeline Optimizations

Tristan J Lubinski, PhD, Sr Scientist, Next Generation Sequencing Informatics, AstraZeneca Pharmaceuticals; Co-organizer, Boston Computational Biology and Bioinformatics (BCBB)

Professor Howard’s Introduction to Flash Storage for Bioinformatics

Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data

Storage solutions we’ve been using force bioinformaticists to make trade-offs between the capacity and low-cost of disk and the performance of flash. This results in complex tiering configurations that only deliver performance for a small slice of the data. In this session, we will review how advancements in technology enable VAST Data to revolutionize the cost of all-flash and allows bioinformatists faster analysis across larger datasets for deeper insights.

12:00 pm

Welcome Remarks

Cindy Crowninshield, Executive Event Director, Cambridge Healthtech Institute