Data Storage and Transport

Is the burden of managing your data growing larger every day? Do you have a scalable and robust data management infrastructure in place to store, process, analyze, and transfer vast quantities of data according to your organization’s policies? Is your organization using new tools and analytical processes such as AI and deep learning that stress your supporting IT infrastructure beyond the expectations of system designers? Managing data has become a prevalent issue in the life sciences industry. Organizations are spending millions on systems and platforms to manage, store, and transfer many types of data (e.g., experimental, operational, clinical) from many different disparate sources. The role of data engineering is critical in orchestrating, configuring, managing, and scaling solutions to manage the data bloat problem. The Data Storage and Transport track presents in-depth case studies from leading life science organizations who are implementing solutions to address data storage and transfer problems and challenges. These include where to store data (cloud, local, mixture), what is the optimal configuration regarding price vs. access, estimating data storage costs and making financial models, understanding and planning for costs in the cloud, what to do with large third-party databases (inter-pharma collaborations, genomic/expression datasets), what to do with imaging collaboration that produces 100 TB, "rehydrating" a data archive (from tape) for re-analysis, determining if you're storing the right stuff, figuring out the best way to deliver data products to customers/collaborators, and more. How are you developing technologies to deal with influx of digital data from digital health devices?

Final Agenda

Monday, April 20

9:00 am - 5:00 pm Hackathon*

*Pre-registration required.

Tuesday, April 21

7:30 am Workshop Registration Open and Morning Coffee

8:30 am - 3:30 pm Hackathon*

*Pre-registration required.


8:30 - 11:30 am Recommended Morning Pre-Conference Workshops*

W2. A Crash Course in AI: 0-60 in Three

Gustavo Arango, PhD, Senior Data Scientist - Oncology Bioinformatics, AstraZeneca

Bino John, PhD, Associate Director, Data Science - Clinical Pharmacology & Safety Sciences, AstraZeneca R&D

John Van Hemert, PhD, Research Scientist, Bioinformatics, Corteva Agri Science, A Dow-Dupont Division

12:30 - 3:30 pm Recommended Afternoon Pre-Conference Workshops*

W13. Structuring Data for Drug Development and Regulatory Submissions: The Role of Standards and Ontology

Lawrence Callahan, PhD, Chemist, Office of Health Informatics, Global Substance Registration System/Office of Health Informatics, Office of Chief Scientist, FDA R&D

Hande Kucuk McGinty, PhD, Research Scientist, Collaborative Drug Discovery R&D

Gregory Pappas, Associate Director, National Surveillance, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration R&D

Michael Waters, Team Lead, System Harmonization and Interoperability Enhancement for Laboratory Data (SHIELD), U.S. Food and Drug Administration

*Separate registration required.

2:00 - 6:30 Main Conference Registration Open

PLENARY KEYNOTE SESSION

4:00 Welcome Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute

 

 

 

4:05 Keynote Introduction

4:15 PLENARY KEYNOTE PRESENTATION: NIH’s Strategic Vision for Data Science

Susan K. Gregurick, PhD, Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health

 

 

 

 

Rebecca Baker, PhD, Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health

 

 

 

 

Riffyn_new 5:00 - 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

 

 

Wednesday, April 22

7:30 am Registration Open and Morning Coffee

PLENARY KEYNOTE SESSION

8:00 Welcome Remarks

Allison Proffitt, Editorial Director, Bio-IT World

 

 

 

8:05 Keynote Introduction

8:15 Toward Preventive Genomics: Lessons from MedSeq and BabySeq

Robert Green, MD, MPH, Professor of Medicine (Genetics) and Director, G2P Research Program/Preventive Genomics Clinic, Brigham & Women’s Hospital, Broad Institute, and Harvard Medical School

 

 

 

8:45 PANEL DISCUSSION: Game On: How AI, Citizen Science, and Human Computation Are Facilitating the Next Leap Forward

Seth CooperSeth Cooper, PhD, Assistant Professor, Khoury College of Computer Sciences, Northeastern University

 

 

 

 

 

Lancashire_LeeLee Lancashire, PhD, Chief Information Officer, Cohen Veterans Bioscience

 

 

 

 

 

Pietro Michelucci, PhD, Director, Human Computation Institute

 

 

 

 

 

Jérôme WaldispühlJérôme Waldispühl, PhD, Associate Professor, School of Computer Science, McGill University

 

 

 

 

 

While the precision medicine movement augurs for better outcomes through targeted prevention and intervention, those ambitions entail a bold new set of data challenges. Various panomic and traditional data streams must be integrated if we are to develop a comprehensive basis for individualized care. However, deriving actionable information requires complex predictive models that depend on the acquisition and integration of patient data on a massive scale. This picture is further complicated by new data streams emerging from quantified self-tracking and health social networks, both of which are driven by experimentation-feedback loops. Tackling these issues may seem insurmountable, but recent advancements in human/AI partnerships and crowdsourcing science adds a new set of capabilities to our analytic toolkit. This talk describes recent work in online collective systems that combine human and machine-based information processing to solve biomedical data problems that have been otherwise intractable, and an information processing ecosystem emerging from this work that could transform the landscape of precision medicine for all stakeholders.

9:45 Coffee Break in the Exhibit Hall with Poster Viewing

DATA STORAGE PLATFORMS & RESOURCES

10:50 Organizer’s Welcome Remarks

Cambridge Healthtech Institute

10:55 Chairperson’s Remarks

11:00 Beyond Discoverability: Metadata to Drive Your Data Management

Russell_TerrellTerrell Russell, PhD, Chief Technologist, The iRODS Consortium at Renaissance Computing Institute (RENCI)

As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories. With an open, policy-based platform, metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance. Data management should be data centric, and metadata driven.

11:20 Building a Foundation for a Data Commons at NIEHS

Conway_MichaelMichael Conway, Data Systems Architect/Engineer, National Institute of Environmental Health Sciences (NIEHS)

The topic of an NIH Data Commons has been an area of great interest and activity, as has the general FAIR data movement. These broad notions are playing out with a future focus while NIEHS works to build its own Data Commons to manage today’s research data. Managing daily work while observing future trends, incorporating key capabilities, often in a tentative and piecemeal fashion, without losing sight of the big picture; this is the challenge we all face.

11:40 Metadata: Getting to Know Your Data

Carlos Rios, PhD, Senior Research Investigator, Computational Genomics – Translational Medicine, Bristol-Myers Squibb

Next generation sequencing (NGS) is routinely being used in cancer research. This produces large amounts of data during data collection and as it gets processed through the pipeline. Tracking and apply context to large amount of data becomes a challenge. Using AWS as a mean of data delivery and processing, has the advantage of automating data delivery and processing. In additions, applying context to the data using iRODS can be automated as data gets delivered and processed. This will provide a primary source of metadata that can be used by other applications downstream.

12:00 pm Getting to the Next Question – Why Organizations Struggle with R&D Compute and Network Demands

Taylor_DanDan Taylor, Director, Internet2

In R&D and health, data sharing and high-speed access to the right compute platform – internal or cloud – remains a challenge. Science networks, explained here, improve productivity by simplifying the sharing of big datasets (regardless of size or sensitivity) while optimizing enterprise compute/storage resources – wherever they are.

12:15 Presentation to be Announced

12:30 Session Break

Weka_Purple 12:40 Luncheon Presentation I Talk Title to be Announced

Shimon Ben-David, Field CTO, WekaIO

Scality

1:10 Luncheon Presentation II to be Announced

1:40 Session Break

SERVERLESS METHODS TO FACILITATE STORAGE AND SUPPORT RESEARCH

1:50 Chairperson’s Remarks

1:55 Pathways in the Cloud: Facilitating Storage for Analysis Pipelines

raumann_brigitteBrigitte Raumann, Product Manager, Globus, University of Chicago

Life sciences researchers must contend with data spread across a wide variety of storage system types, including on-premises and cloud storage. In order to efficiently and reproducibly execute data analysis pipelines, researchers need secure and sophisticated data management capabilities that provide unified data access irrespective of storage location and type. In this talk, we will discuss how Globus provides a solution to this challenge, including a success story of a popular data analysis system built on the Globus framework.

2:25 Boosting Research with Serverless Cloud Computing

Butnaru_DanielDaniel Butnaru, PhD, Research Architect, Roche

The ideal playing field for data experimentation is the cloud. While fast and elastic access to compute resources is the best-known advantage of the cloud, the actual differentiator for research are the fully managed (serverless) capabilities offered by the cloud. Robust, on demand easily understood data processing pipelines can be created for a number of research questions, without the overhead of infrastructure management. Learn how serverless computing can be used to support research questions in mass spectrometry and cell line development use cases.

2:55 Presentation to be Announced

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

SECURITY AND COMPLIANCE OF HIGH-PERFORMANCE FILE SYSTEMS

4:00 Chairperson’s Remarks

4:05 Can High-Performance File Systems Be Secure and Inexpensive?

Petersen_DirkDirk Petersen, Scientific Computing Director, Fred Hutchinson Cancer Research Center

Posix file systems continue to be work horses of research computing. As the role of security and compliance has significantly increased, data storage was augmented with security features which have performance and cost implications. We would like to demonstrate a secure multi-petabyte production deployment at Fred Hutch using open source components such as BeeGFS.

4:35 Aspects of Performance in Data Movement and Management

Vasiliadis_VasVas Vasiliadis, Chief Customer Officer, Globus, University of Chicago

When it comes to moving and managing life science and bioinformatics data, performance is influenced by a multitude of factors. Researchers want their data management solutions to “just work,” without concern for what tend to be esoteric technical details. But delivering tools that simplify a researcher’s life requires a tradeoff between performance, reliability, usability, security, compliance, and technical complexity. In this talk we will explore how service providers should evaluate and prioritize these and other factors, and illustrate how the Globus service strikes an optimal balance, making it the de facto data management solution for over 100,000 researchers around the world. We will present case study examples, and attendees will be encouraged to share their own experiences.

5:05 Presentation to be Announced 

 

 

Stellus_Technologies

 

 

5:35 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

 

RedRiver

 

 

 


6:45 End of Day

Thursday, April 23

7:30 am Registration Open and Morning Coffee

PLENARY KEYNOTE SESSION & AWARDS PROGRAM

8:00 Organizer’s Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute

 

 

 

8:05 Awards Program Introduction

8:10 Benjamin Franklin Award and Laureate Presentation

J.W. Bizzaro, Managing Director, Bioinformatics.org

 

 

 

 

Discngine8:35 Bio-IT World Innovative Practices Awards

Allison Proffitt, Editorial Director, Bio-IT World

 

 

 

9:00 AI in Pharma: Where We Are Today and How We Will Succeed in the Future

Natalija Jovanovic, PhD, Chief Digital Officer, Sanofi Pasteur

 

 

 

Penguin_Computing_Tagline 9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced at 10:00

 

 

TECHNOLOGY FOR ENABLING ADVANCED DATA ANALYTICS

10:30 Organizer’s Remarks

Cambridge Healthtech Institute

10:35 Chairperson’s Remarks

10:40 A Deeper Technical Dive to Enabling Advanced Analytics and Data Science across an Organization

Jason Tetrault, Global Head Data Engineering and Emerging Technologies, Takeda

Gutwin_KarlKarl Gutwin, PhD, Senior Scientific Consultant, BioTeam


Eschallier_PhilPhil Eschallier, CTO, RCH Solutions

Takeda has an internal product that we call Insight, tools for our Advanced Analytics Community in R&D. This is where we use Spark, Hail, RStudio and Connect, TensorFlow and others. It is also where our community shares their data. We will talk about the importance of building and partnering with a community, the iterative approach to adding new capabilities and the services that make it work. We want you to take away lessons learned from both a technology and community perspective.

11:40 Sponsored Presentation (Opportunity Available)

12:10 pm Session Break

12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Last Chance Poster Viewing

KEYNOTE PRESENTATION & PANEL DISCUSSION:
TRENDS FROM THE TRENCHES 2020

1:55 Chairperson’s Remarks

Kevin Davies, PhD, Executive Editor, The CRISPR Journal; Founding Editor, Bio-IT World


 

2:00 KEYNOTE PRESENTATION & PANEL DISCUSSION: Trends from the Trenches

Dagdigian_ChrisChris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.


 

Bonazzi_VivienVivien Bonazzi, PhD, Chief Biomedical Data Scientist, Managing Director, Deloitte


 

Cutts_TimTim Cutts, PhD, Head, Scientific Computing, Wellcome Trust Sanger Institute


 

Fagnan_KjierstenKjiersten Fagnan, PhD, Chief Informatics Officer, Data Science and Informatics Leader, DOE Joint Genome Institute, Lawrence Berkeley National Laboratory


 

Trunnell_MatthewMatthew Trunnell, Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center


 

The “Trends from the Trenches” will celebrate its 10th Anniversary at Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data-intensive science. In 2020, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session.

4:00 Close of Conference


Platinum Sponsors

accenture

BenchlingNEW

Elsevier-square

L7-informatics

linguamatics

Nutanix

PerkinElmer

Weka