Data and Metadata Management

With the increased demand in computing power from life science researchers and scientists tackling big data issues, storage and infrastructure must be able to scale to handle billions of data points and files efficiently. The problem is administration of data to ensure information can be integrated, accessed, shared, linked, analyzed, and maintained to best effect across the organization. The Data and Metadata Management track will explore how to manage workflows with data and metadata without rerunning everything, but with the ability to handle data updates and new versions of the software. We will also explore how to associate the processed data and features with the raw data for analysis purposes.

Final Agenda

Monday, April 20

9:00 am - 5:00 pm Hackathon*

*Pre-registration required.

Tuesday, April 21

7:30 am Workshop Registration Open and Morning Coffee

8:30 am - 3:30 pm Hackathon*

*Pre-registration required.

8:30 - 11:30 am Recommended Morning Pre-Conference Workshops*

W1. Data Management for Biologics: Registration and Beyond

Diana Bowley, Business Relationship Manager, Biologics, AbbVie

Benjamin Li, Head of IT RDM Biological Sample Production & Management, Boehringer Ingelheim

Yuan Lin, Senior Manager, Pfizer Digital, Pfizer

Sebastian Schlicker, Director, Biologics Business, Genedata, Basel, Switzerland

Monica Wang, PhD, Principal Bioinformatics Architect, Project and Program Manager, Global Research IT, Takeda

12:30 - 3:30 pm Recommended Afternoon Pre-Conference Workshops*

W10. Data Science Driving Better Informed Decisions

Meghan Raman, Head, R&D Data Lake and Analytics, Bristol-Myers Squibb

*Separate registration required.

2:00 - 6:30 Main Conference Registration Open

4:00 Welcome Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute




4:05 Keynote Introduction

4:15 PLENARY KEYNOTE PRESENTATION: NIH’s Strategic Vision for Data Science

Susan K. Gregurick, PhD, Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health





Rebecca Baker, PhD, Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health





5:00 - 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing (Sponsorship Opportunity Available)

Wednesday, April 22

7:30 am Registration Open and Morning Coffee

8:00 Welcome Remarks

Allison Proffitt, Editorial Director, Bio-IT World




8:05 Keynote Introduction

8:15 Toward Preventive Genomics: Lessons from MedSeq and BabySeq

Robert Green, MD, MPH, Professor of Medicine (Genetics) and Director, G2P Research Program/Preventive Genomics Clinic, Brigham & Women’s Hospital, Broad Institute, and Harvard Medical School




8:45 PANEL DISCUSSION: Game On: How AI, Citizen Science, and Human Computation Are Facilitating the Next Leap Forward

Pietro Michelucci, PhD, Director, Human Computation Institute






Additional Panelists to be Announced

9:45 Coffee Break in the Exhibit Hall with Poster Viewing


10:50 Organizer’s Welcome Remarks

Cambridge Healthtech Institute

10:55 Chairperson’s Remarks

11:00 Building a Toolkit for FAIR Implementation by Life Science Industry

Ian Harrow, PhD, Consultant Project Manager and Manager, FAIR Implementation and Ontologies Mapping Project, Pistoia Alliance

We report on building a new toolkit to help life science industry implement the FAIR (Findable, Accessible, Interoperable, Reusable) principles for data management and stewardship. It provides practical support by bringing together relevant methods for tools, training and managing change, which are illustrated by use cases mostly from life science industry. These elements are assembled together as one user-friendly and freely accessible website.

11:20 Normalizing Adverse Events Terminologies for Text Processing

Qais Hatim, PhD, Computer Scientist, U.S. Food and Drug Administration

The FDA receives a high proportion of data as unstructured text. Natural Language Processing (NLP) is used to normalize information for further analysis or machine learning. A challenge is the variation in the ways that concepts are referred. Although there are a number of open source terminologies, not all are designed for text processing. We will describe how to adapt terminologies and extend matching, e.g., for spelling or OCR errors. This work will highlight the way to import new ontology that will be designed to be in agreement with FDA standards in different fields such as drug label, NDA, etc.

11:40 A New Compound Platform for Enhanced Access to Chemical Space for Screening

Michael Lange, ML/AI Lead, R&D Informatics, Small Molecule

Discovery Informatics, Roche

Over the last years, the commercially available chemical space (with pharmaceutical relevance) has rapidly increased. Several providers today are offering catalogs consisting of several hundred millions of screening compounds. We built a new compound platform to enable browsing, searching, selection, and ordering of compound sets from these libraries. The platform offers these capabilities by standardizing and preprocessing all molecules, calculating relevant properties, and enabling access to these libraries by combining fast structure-based search with property and metadata filters. This presentation will present the overall architecture and highlight some of the challenges encountered during the implementation.

12:00 pm Sponsored Presentation (Opportunity Available)

12:30 Session Break

Igneous_Horizontal12:40 Presentation to be Announced

1:10 Luncheon Presentation (Opportunity Available) or Enjoy Lunch on Your Own

1:40 Session Break


1:50 Chairperson’s Remarks

Brian Bissett, MBA, MSEE, FAC P/PM, IT Specialist, Hardware Engineering, US Government

1:55 Defending against the Persistence of Inevitability

Brian Bissett, MBA, MSEE, FAC P/PM, IT Specialist, Hardware Engineering, US Government

Most data breaches represent a systemic breakdown along multiple lines of both technical and human factors. While many factors can contribute to an unauthorized release, the effort necessary to protect against these factors is not equal. This discussion will be from a holistic viewpoint of many security breaches, the breakdowns in fundamental security concepts which lead to the breaches, and the factors of paramount consideration in protecting an enterprise.

2:15 Data Security and Governance for Biopharma

Jyotin Gambhir, MBA, CISM, Founder, SecureFLO

Governance provides a playbook for a biopharma company to manage security and privacy compliance. Good governance leads to a better managed goal and a focused IT environment. CyberHygiene today is critical for any company developing a drug or researching cures and trying to protect intellectual property, as well as subjects’ personal information. Regulations under FDA and FTC, as well as EU GDPR, can be complicated.

2:35 Dynamic Encryption and Watermarking of Genomic Sequencing Data to Facilitate Privacy-Preserving Ownership-Based Data Governance

Xiaowu Gai, PhD, Director, Bioinformatics; Associate Professor, Clinical Pathology, Pathology & Laboratory Medicine, Children’s Hospital of Los Angeles

To facilitate privacy-preserving ownership-based data governance, we developed two novel algorithms which can be used to implement flexible fine-grained protection of genomic data: a) dynamic privacy-preserving encryption of user-specified genomic regions; and b) ownership and utility-preserving watermarking of the sequencing data. This empowers individuals to control when, for how long, and for what purpose any portion of their genomic data is shared, all in an auditable manner.

2:55 Presentation to be Announced

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing


4:00 Chairperson’s Remarks

Sanjay Joshi, Industry CTO, Healthcare, Dell EMC

4:05 PANEL DISCUSSION: Real-World Evidence (RWE): Data Provenance, Format, Ingest, Quality (Bias), Integration, Visualization, Transformation, Verification & Validation, and Implementation


Sanjay Joshi, Industry CTO, Healthcare, Dell EMC

Additional Panelists to be Announced

The future of the intersection of healthcare and the life sciences will be data- and process-focused, not application- or software-focused. “Bringing the analytics to Data” is the challenge from an infrastructure and methods perspective. According to the FDA, Real-World Evidence (RWE) is defined as “the clinical evidence regarding the usage and potential benefits or risks of a medical product derived from analysis of Real-World Data (RWD): e.g., effectiveness or safety outcomes from an RWD source in randomized clinical trials or in observational studies.” Our topical, honest, and “real-world” panel will discuss the sources of RWD (EHR, Claims & Billing, Registries, Patient Reported Data, etc.) and their process implications for RWE and the future of clinical trials themselves.

5:05 Sponsored Presentation (Opportunity Available)


5:35 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

6:45 End of Day

Thursday, April 23

7:30 am Registration Open and Morning Coffee

8:00 Organizer’s Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute




8:05 Awards Program Introduction

8:10 Benjamin Franklin Award and Laureate Presentation

J.W. Bizzaro, Managing Director,




Discngine8:35 Bio-IT World Innovative Practices Awards

Allison Proffitt, Editorial Director, Bio-IT World




9:00 AI in Pharma: Where We Are Today and How We Will Succeed in the Future

Natalija Jovanovic, PhD, Chief Digital Officer, Sanofi Pasteur




9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced at 10:00


10:30 Organizer’s Remarks

Cambridge Healthtech Institute

10:35 Chairperson’s Remarks

10:40 Cascadia Data Discovery Initiative: Accelerating Health Innovation and Cancer Research through Collaboration, Data Sharing, and Data-Driven Research

Matthew Trunnell, Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center

11:10 The National Microbiome Data Collaborative: A FAIR Data Resource for Microbiome Research

Kjiersten Fagnan, PhD, Chief Informatics Officer, Data Science and Informatics Leader, DOE Joint Genome Institute, Lawrence Berkeley National Laboratory

Our multi-lab collaborative partnership will pilot an integrated, community-centric framework within 27 months to fully leverage existing microbiome data science resources and high-performance computing systems available within the DOE complex for data access, integration, and advanced analyses. In this talk I will cover some of the challenges in microbiome data sciences and how we aim to overcome these by creating a large, open-access repository of FAIR data.

11:40 Sponsored Presentation (Opportunity Available)

12:10 pm Session Break

12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Last Chance Poster Viewing



1:55 Chairperson’s Remarks

Kevin Davies, PhD, Executive Editor, The CRISPR Journal, Mary Ann Liebert, Inc.


Chris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.

Vivien Bonazzi, PhD, Chief Biomedical Data Scientist, Managing Director, Deloitte

Tim Cutts, PhD, Head, Scientific Computing, Wellcome Trust Sanger Institute

Kjiersten Fagnan, PhD, Chief Informatics Officer, Data Science and Informatics Leader, DOE Joint Genome Institute, Lawrence Berkeley National Laboratory

Matthew Trunnell, Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center

The “Trends from the Trenches” will celebrate its 10th Anniversary at Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data-intensive science. In 2020, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session.

4:00 Close of Conference

Platinum Sponsors