Track 2: Data Management

Manage Workflows and Administer Effective Data Processes

May 4 - 5, 2022 ALL TIMES EDT

With the increased demand in computing power from life science researchers and scientists tackling big data issues, data storage infrastructure must be able to scale to handle billions of data points and files efficiently. The problem is administration of data to ensure information can be integrated, accessed, shared, linked, analyzed, and maintained to best effect across the organization. The Data Management track will explore important themes related to FAIR data, data risk models, and data sharing and reuse.

Tuesday, May 3

7:00 am Registration Open (Plaza Level Lobby)
8:00 am Recommended Pre-Conference Workshops and Symposium*

On Tuesday, May 3, 2022 Cambridge Healthtech Institute is pleased to offer nine pre-conference workshops scheduled across three time slots (8:00-10:00 am, 10:30 am-12:30 pm, and 1:45-3:45 pm) and a Symposium from 8:25 am-3:45 pm. All are designed to be instructional, interactive and provide in-depth information on a specific topic. They allow for one-on-one interaction and provide a great way to explain more technical aspects that would otherwise not be covered during the main conference tracks that take place Wednesday-Thursday.

*Separate registration required. See Workshop page and Symposium page for details.

3:45 pm Session Break and Transition to Plenary Keynote

PLENARY KEYNOTE LOCATION: 210 (Overflow 208)

PLENARY KEYNOTE PROGRAM

4:00 pm

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
4:05 pm Innovative Practices Award
Mike Tarselli, PhD, Chief Scientific Officer, TetraScience
4:30 pm

Ask What IT Can Do for Bio...and What Bio Can Do for IT

George M. Church, PhD, Robert Winthrop Professor, Genetics, Harvard Medical School

IT for Bio: In May 2021, one haploid human genome (3.055 billion bp) was sequenced completely, but zero diploid. We have 7.7 billion diploid humans yet to be sequenced and correlated with their environments and traits in the Personal Genome Project. Plus, at least one genome from each of over 8.7 million eukaryotic species in the Earth Biogenome project. Plus, monitoring pathogenic and commensal bacteria, allergens, and viruses in the BioWeatherMap. Plus, ancient DNA. We are counting RNA molecules per cell in most (or all) cell types in humans, mice, and many other species throughout development and connectome (with imaging resolution up to 20 nm).   

Bio for IT: Reading and writing DNA has improved exponentially in cost (at least 60 million fold) and is increasingly used for storing non-biological data. The record for editing DNA in vivo is now 24,000 edits per cell and for storing data in vivo is about 1 terabyte per mouse. Enormous chemical and biological 'libraries' can perform 'Natural Computing' for tasks far beyond current von-Neumann silicon and quantum computers. The combination of these – machine learning + megalibraries (ML-ML) is already having commercial impact (e.g. Nabla, Manifold, Dyno, Patch). 

5:45 pm Welcome Reception in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)
7:00 pm Close of Day

Wednesday, May 4

7:00 am Registration Open and Morning Coffee (Plaza Level Lobby)

PLENARY KEYNOTE ROOM LOCATION: 210

PLENARY KEYNOTE PROGRAM

8:00 am

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
Zachary Powers, Chief Information Security Officer, Benchling
8:15 am

Accessing and Securing the Data that Drives Breakthroughs

Allison Proffitt, Editorial Director, Bio-IT World
Rachana Ananthakrishnan, Executive Director, Globus, University of Chicago
Ari E. Berman, PhD, CEO, BioTeam, Inc.
Jonathan C. Silverstein, Chief Research Informatics Officer & Professor, Biomedical Informatics, University of Pittsburgh
Rebecca F. Rosen, PhD, Director, Office of Data Science and Sharing, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health

Life sciences research is generating massive amounts of data that should be accessible to collaborators and colleagues to enable breakthrough discoveries. However, ensuring sensitive data are shared securely in a manner that protects patient privacy and complies with myriad regulations is a daunting task, which often slows the pace of research. Our panel of leading practitioners will share insights on the challenges and best practices of managing protected research data.

9:30 am Coffee Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)

ROOM LOCATION: 202

FAIR JOURNEYS - LEARNINGS FROM THE FRONT LINES

10:15 am Organizer's Remarks
10:20 am

Chairperson's Remarks

Carmen I. Nitsche, General Manager, Cambridge Crystallographic Data Centre
10:25 am

Our Continuing FAIR Journey – CCDC Learnings and Insights

Carmen I. Nitsche, General Manager, Cambridge Crystallographic Data Centre

Advancing FAIR Data Principles is not a one-time act, but rather a continuing and iterative process that addresses a wide range of data management challenges. At The Cambridge Crystallographic Data Centre, a U.K. registered charity dedicated to the advancement of structural science, we bear the great responsibility of curating and maintaining a global, comprehensive collection of structural information relating to small molecules. So, FAIR can be found at the heart of what we do. Today we will share insights and learnings from our continuing FAIR journey.

10:45 am

FAIR Implementation Challenges & Learnings

Nick Lynch, PhD, Founder & CTO, Curlew Research; Member, FAIRplus Consortium

The FAIR data principles (https://www.nature.com/articles/sdata201618) have become critical in supporting data management and governance strategy for life science organizations over the last few years as the value of data as an organizational asset has increased. With many groups supporting the overall FAIR approach, relatively little attention has been given to the actual technical, scientific as well as cultural implications of making biological data FAIR. The FAIRplus project (https://fairplus-project.eu/) aims to develop practical resources to enable FAIRification of data with a range of activities including capability maturity model for FAIR adoption, fellowship programme and data privacy activities. One of these key deliverables is the The FAIR cookbook (https://fairplus.github.io/the-fair-cookbook/content/home.html), an open, comprehensive resource with ‘recipes’ for making different types of life science data FAIR. In this talk, we will show some of the approaches and deliverables the FAIRplus project is building to support the implementation of FAIR and the learnings from the project on FAIR sustainability.

11:05 am

FAIR-ifying Data

Sabine Schefzick Jalaie, PhD, Director Advanced Analytics Platform, Science & Clinical Analytics & Analytic Innovation, Pfizer Inc.
11:25 am PANEL DISCUSSION:

Moderated Q&A with Session Speakers

Panel Moderator:
Carmen I. Nitsche, General Manager, Cambridge Crystallographic Data Centre
Panelists:
Nick Lynch, PhD, Founder & CTO, Curlew Research; Member, FAIRplus Consortium
Sabine Schefzick Jalaie, PhD, Director Advanced Analytics Platform, Science & Clinical Analytics & Analytic Innovation, Pfizer Inc.
Subadhra Parthasarathy, Specialist Leader, Deloitte
Suman Kumar, Senior Manager, Deloitte

In pharmaceutical R&D, Deloitte has been investing in solutions across every part of the value chain with AI/ML models embedded throughout. We have created a set of "Model as Service" and data pipeline systems that accelerate and streamline the model embedded intelligence required for the R&D value chain. We will discuss our scalable approach that has successfully delivered multiple use cases which have achieved business value and critical advanced AI capabilities.

Kelsey Luu, M.S. Bioinformatics Candidate, Genestack/Harvard University
AI approaches demonstrate promise as a means for modeling high complexity systems like biological networks. As such, AI can be leveraged to perform biological pathway analysis, identify key genes that regulate disease, and propose viable candidate drug targets. 
This talk will discuss a deep learning framework for identifying robust and novel disease-associated pathways from the growing availability of multi-omics datasets.
Can (John) Akgun, Ph.D., Senior Vice President of Business Development, Flywheel

R&D departments across the Life Sciences are gravitating toward cloud-scalable, data-driven strategies to take advantage of the AI boom. However, the first step in any AI initiative is to establish basic data management practices for data centralization, curation, and computing. In this presentation, a modern data management and collaboration platform will be presented that highlights enterprise-scale AI enablement.

12:55 pm Session Break and Transition to Luncheon Presentation
Christof Gänzler, PhD, Product Marketing Manager Biology, PerkinElmer Informatics

The industry continues to use generic software tools like Excel and GraphPadPrism to process and track assay results.  This decentralized approach while favored by bench scientists because of its flexibility can result in inconsistent, irreproducible, and poor quality results.  A more centralized approach often delivers only point solutions.  Learn how PerkinElmer Informatics assay data solutions combine flexibility, ease of use and centralized data and calculation management across diverse assay techniques.  

 

1:50 pm Refreshment Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)

DATA RISK MODELS

2:35 pm

Chairperson's Remarks

Sanjay Joshi, Global Lead for Healthcare and Life Sciences, Tanium
2:40 pm PANEL DISCUSSION:

Data Risk Models – A Workflow Approach

Panel Moderator:
Sanjay Joshi, Global Lead for Healthcare and Life Sciences, Tanium

The naming, finding, observing, measuring, understanding, and reporting of data (in that order) forms the basis of most cybersecurity and risk frameworks and regulations. The panel will explore these concepts in the realm of FAIR data and the various Risk Models to provide an understanding of the processes that drive biomedical research and production workflows.

Panelists:
Khaled El Emam, Co-Founder and CEO, Replica Analytics
Reva Schwartz, Principal Investigator, National Institute of Standards and Technology (NIST)
Peter Mesenbrink, PhD, Exec Dir Biostatistics, Novartis Pharmaceuticals
Dominik Matousek, Data Fabric Practice Lead, Ataccama Americas, Ataccama Corporation

Data Quality has always been important for scientific workloads and machine learning. With the emergence of the Data Fabric concept; automated data provisioning and dynamic data pipelines, and traditional approaches to Data Quality no longer scale. Join the session to learn how to approach embedding DQ into Data fabric and how you can make your systems Future-proof with Automated Data Quality.

Vinod Kasam, Principal, Cloud Computing, Zifo RnD Solutions

Scientific communities often rely on high performance computing (HPC) for accelerating and enhancing the research in pharmaceutical industries. Remarkable advancements in Cloud-based cluster environments, specifically container scaling and batch processing along with serverless architectures enabled scientists to run their workflows entirely automatically with minimum IT interventions. In this presentation, we will discuss few R&D workflows we developed in AWS cloud to accelerate science at pharma and biotech industries.

 
4:40 pm Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)
6:00 pm Close of Day

Thursday, May 5

7:30 am Registration Open and Morning Coffee (Plaza Level Lobby)

PLENARY KEYNOTE ROOM LOCATION: 210

PLENARY KEYNOTE PROGRAM

8:00 am

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
Nate Raine, Director Data Custodians, Lifebit
8:15 am

Leveraging Large-Scale Human Data to Advance and Accelerate Drug Discovery

Shankar Subramaniam, PhD, Distinguished Professor of Bioengineering; Professor of Chemistry, Biochemistry and Nanotechnology; Adjunct Professor of Cellular & Molecular Medicine, University of California at San Diego

Advances in genomics technologies have led to generation of massive amounts of human data. This has catalyzed new insights into cellular processes in the normal and disease state and facilitated the search for safe and effective medicines. The UK Biobank, All of US and TopMed initiatives are exemplars of this approach. We highlight examples from our lab where meaningful insights have been obtained advancing our understanding of disease biology and its pharmacological application.

9:30 am Coffee Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)

ROOM LOCATION: 202

DATA SHARING AND REUSE FOR RESEARCH

10:15 am Organizer's Remarks
10:20 am

Chairperson's Remarks

Santha Ramakrishnan, PhD, Global Data Governance Lead, Sanofi
10:25 am PANEL DISCUSSION:

Data Sharing and Reuse for Research

Panel Moderator:
Santha Ramakrishnan, PhD, Global Data Governance Lead, Sanofi

The reuse of data generated internally in Pharma for the purposes of research is becoming a priority for several companies. Regulations, ethical and legal obligations are challenging the industry to come up with smart and creative ways of addressing the need. The panel will discuss all aspects of the multifaceted problem, innovative approaches available to pharma and where they may collaborate in a precompetitive manner to build cutting edge solutions.

Panelists:
Victoria A. Gamerman, PhD, Global Head of Data Governance, Boehringer Ingelheim Pharmaceuticals, Inc.
Aaron Mann, Senior Vice President, Data Science, Clinical Research Data Sharing Alliance
Peter Mesenbrink, PhD, Exec Dir Biostatistics, Novartis Pharmaceuticals
Mohamed-Ramzi Temanni, PhD, Scientific Director, Head of France AI/Genomics, Computational Sciences, Janssen R&D - Global Development
Ilian Uzunov, Sales Director, Life Sciences, Pharma, Healthcare, Ontotext
Martina Markova, Ontotext

Ontotext’s AI Powered Target Discovery helps biotech and pharma companies identify new drug targets and drug repurposing candidates quickly and reliably in a variety of therapeutic areas. Our technology unlocks relevant information in structured and unstructured reference data sources. All data sources are mapped and managed in a central knowledge graph. This allows expert users to extract information about any entity and context of interest, be it gene, disease or protein. 

Daniel Herzig-Sommer, COO, metaphacts
Maksim Kolchin, PhD, Knowledge Graph Platform Lead, Boehringer Ingelheim

Knowledge Graph-driven FAIR data platforms have proven to accelerate knowledge democratization & decision intelligence. They empower end users & machines to access and consume knowledge intuitively & in context. In this talk, we will discuss best practices for building a semantic layer on top for your data mesh to enable domain experts. We will also share insights from an enterprise implementation based on metaphactory at Boehringer Ingelheim.

12:55 pm Session Break and Transition to Luncheon Presentation
Stavros Papadopoulos, CEO and Founder, TileDB
Stephen Kingsmore, MD, President and CEO, Rady Children's Institute for Genomic Medicine

Stavros Papadopoulos introduces TileDB as a cloud-native database for storing variant-call data as nD arrays, enabling governance, analysis, and sharing at extreme scale and low cost. He’ll cover VCF data management challenges and the solution as applied in Project BabyBear, scaling rapid whole-genome sequencing for newborns. Dr. Stephen Kingsmore of Rady Children’s Hospital will cover his experiences & breakthrough performance benchmarks for the first 8,000 genomes analyzed.

 
1:50 pm Refreshment Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)

ROOM LOCATION CHANGE: 210

KEYNOTE PROGRAM: TRENDS FROM THE TRENCHES

2:35 pm

Trends from the Trenches

Chris Dagdigian, Senior Director, BioTeam, Inc.
Matthew Trunnell, Data Commoner
Adam Kraut, Director Infrastructure & Cloud Architecture, BioTeam, Inc.
Anna Sowa, PhD, Senior Scientific Consultant, BioTeam, Inc.
Michelle Bayly, PhD, Senior Scientific Consultant, BioTeam, Inc.

Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, cloud, data science, and machine learning that are involved in supporting data-intensive science. In 2022, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session. 

4:10 pm Close of Conference





Submit Your Speaker Proposal

Modern Data Platforms and Storage Infrastructure