Track 3: Data Science and Analytics Technologies

Transform Data into Fast Answers, at Scale to Advance Biomedical Research

May 4 - 5, 2022 ALL TIMES EDT

The Data Science and Analytics Technologies track will explore data science and analytics tools, technologies, and languages that data scientists are using to gain extra insights and value from data. Presentations will explore becoming a data-driven organization, innovative approaches to data management and analytics, making real impact with data science, and applying data science and tools.

Tuesday, May 3

7:00 am Registration Open (Plaza Level Lobby)
8:00 am Recommended Pre-Conference Workshops and Symposium*

On Tuesday, May 3, 2022 Cambridge Healthtech Institute is pleased to offer nine pre-conference workshops scheduled across three time slots (8:00-10:00 am, 10:30 am-12:30 pm, and 1:45-3:45 pm) and a Symposium from 8:25 am-3:45 pm. All are designed to be instructional, interactive and provide in-depth information on a specific topic. They allow for one-on-one interaction and provide a great way to explain more technical aspects that would otherwise not be covered during the main conference tracks that take place Wednesday-Thursday.

*Separate registration required. See Workshop page and Symposium page for details.

3:45 pm Session Break and Transition to Plenary Keynote



4:00 pm

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
4:05 pm Innovative Practices Award
Mike Tarselli, PhD, Chief Scientific Officer, TetraScience
4:30 pm

Ask What IT Can Do for Bio...and What Bio Can Do for IT

George M. Church, PhD, Robert Winthrop Professor, Genetics, Harvard Medical School

IT for Bio: In May 2021, one haploid human genome (3.055 billion bp) was sequenced completely, but zero diploid. We have 7.7 billion diploid humans yet to be sequenced and correlated with their environments and traits in the Personal Genome Project. Plus, at least one genome from each of over 8.7 million eukaryotic species in the Earth Biogenome project. Plus, monitoring pathogenic and commensal bacteria, allergens, and viruses in the BioWeatherMap. Plus, ancient DNA. We are counting RNA molecules per cell in most (or all) cell types in humans, mice, and many other species throughout development and connectome (with imaging resolution up to 20 nm).   

Bio for IT: Reading and writing DNA has improved exponentially in cost (at least 60 million fold) and is increasingly used for storing non-biological data. The record for editing DNA in vivo is now 24,000 edits per cell and for storing data in vivo is about 1 terabyte per mouse. Enormous chemical and biological 'libraries' can perform 'Natural Computing' for tasks far beyond current von-Neumann silicon and quantum computers. The combination of these – machine learning + megalibraries (ML-ML) is already having commercial impact (e.g. Nabla, Manifold, Dyno, Patch). 

5:45 pm Welcome Reception in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)
7:00 pm Close of Day

Wednesday, May 4

7:00 am Registration Open and Morning Coffee (Plaza Level Lobby)



8:00 am

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
Zachary Powers, Chief Information Security Officer, Benchling
8:15 am

Accessing and Securing the Data that Drives Breakthroughs

Allison Proffitt, Editorial Director, Bio-IT World
Rachana Ananthakrishnan, Executive Director, Globus, University of Chicago
Ari E. Berman, PhD, CEO, BioTeam, Inc.
Jonathan C. Silverstein, Chief Research Informatics Officer & Professor, Biomedical Informatics, University of Pittsburgh
Rebecca F. Rosen, PhD, Director, Office of Data Science and Sharing, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health

Life sciences research is generating massive amounts of data that should be accessible to collaborators and colleagues to enable breakthrough discoveries. However, ensuring sensitive data are shared securely in a manner that protects patient privacy and complies with myriad regulations is a daunting task, which often slows the pace of research. Our panel of leading practitioners will share insights on the challenges and best practices of managing protected research data.

9:30 am Coffee Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)



10:15 am Organizer's Remarks
10:20 am

Chairperson's Remarks

Christopher Kenneally, Senior Director, Content Marketing, Copyright Clearance Center
10:25 am

Embedding AI Automation in Pharmaceutical R&D Processes

Srikanth Ramakrishnan, Director, Intelligent Automation & Analytics, Janssen Pharmaceuticals R&D IT

The leverage of AI in pharmaceuticals is pushing the boundaries of scientific research and development. While data science models provide unforeseen analytical insights, the use of AI-enabled automation provides new capabilities for rapid and iterative experimentation. Pharmaceutical companies have always risked immense resources for uncertain results; however, AI Automation provides novel capabilities to reduce the uncertainty. This talk will focus on the foundational technologies which could enable such strategies.

10:55 am

Building a Data Science Program at an Independent Non-Profit Biomedical Research Institution

Eduardo Zaborowski, PhD, MBA, Senior Director, Data Science Program Development, The Jackson Laboratory

The Jackson Laboratory is known for its mice and genetic data resources used worldwide in biomedical research. Its unique internal organization creates a collaborative environment that integrates large-scale mouse genetics and human genomics data to understand the underlying causes of human health and disease. We will present our current progress in building a data science program that aims to accelerate converting this data into useful information and knowledge.

11:25 am

Data Science in Digital Health

Meghan Raman, Senior Director, IT – Global Biometrics & Data Sciences, Bristol Myers Squibb Co.

Data and analytics have become key building blocks in accelerating clinical trials and enhancing patient care & physician experience in clinical practice. Data science is used for improved patient care experience, better outcomes, improved physician experience, clinical workflow, data interoperability, and health equity. Data, technology, and data science help reform the global health ecosystem where health becomes smart through connected, unified, intelligent, and optimized system of care.

Stephen Howe, Sr. Product Manager, Corporate Solutions, Copyright Clearance Center (CCC)

Finding the right people to collaborate with on research is essential to stay ahead of your competition, but identifying key talent can be challenging and time consuming. Join us to learn how a broad range of users across an organization can use knowledge graphs to sift through enormous volumes of data in an easy-to-use interface to quickly identify KOLs and rising stars to give their organizations a competitive edge.

Kevin Cronin, PhD, Vice President Corporate Development, Protein Metrics Inc

In the Byosphere software platform from Protein Metrics, we show how data from LC-UV-MS experiments can be represented in dashboards. Some of the most complex mass spectrometry analyses are now more easily shareable, digestible, and understandable by an entire organization. Information cannot be siloed and now the information is freed up for mining in a data lake. Furthermore, these can be accessed in a web browser with instant, worldwide access.


12:25 pm

Transforming Drug Discovery Data Preparation Into a Single Data Ecosystem: Theory and Applications

Jeremy Desaphy, PhD, Director Scientific Data & Informatics, Genetic Medicines, Eli Lilly & Company

More than 1600 biologically-related resources are currently maintained by laboratories around the world, leading to an entangled web of complex relationships and identifiers. Thus, data preparation is becoming increasingly challenging to both build and reproduce. We present Biorels (Biological Relationships), an open-source, automated and standardized data preparation workstream covering the main resources necessary for drug discovery. Representing the natural relationships found in biology, BioRels enables complex querying capabilities across several data sources seamlessly.

12:55 pm Session Break and Transition to Luncheon Presentation
1:05 pm Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own
1:50 pm Refreshment Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)


Steven Labkoff, Quantori
2:40 pm

Delivering Rapid Data Access and Insights to Accelerate Precision Medicine

Alex Li, Director, Data Science Platform, Janssen R&D, LLC

As a R&D organization, data is one of the most critical assets that will drive development and enable innovation of future precision medicine therapies for patients in need. The Oncology drug development landscape continues to evolve and is becoming more complex as we develop more targeted and personalized treatments. We have an opportunity to change our digital infrastructure and drug development process to advance clinical development and improve patient care through data. Janssen Data Science and the Oncology Biomarker and Diagnostic teams are partnering to develop new data platforms, processes, and advanced analysis pipelines to generate close to real-time insights. Access to clinical biomarker data for precision oncology studies will enable rapid decision-making to support trial execution and reverse translation of clinical findings back into the drug discovery pipeline.

3:10 pm

Enhancing Data Discoverability with Natural Language Processing at the DOE Joint Genome Institute

Kjiersten Fagnan, PhD, CIO, Data Science & Informatics, Lawrence Berkeley National Laboratory

The Department of Energy's Joint Genome Institute (JGI) produces thousands of unique, public, multi-omic data sets. We have been working to make JGI data more findable by adding search terms from the application of natural language processing to the proposals and publications associated with the data. The search capability will be available through JGI's Data Portal and API. In this talk I will describe the process we have used to identify publications that leveraged specific JGI datasets and how those publications were used to refine a method for identifying terms that may provide more intuitive search. 

3:40 pm

R2O: What Healthcare Must Learn from Meteorology

Vivian Neilley, Lead Interoperability Solution Engineer, Google Cloud Healthcare

Research to Operations (R2O) is a term that meteorology uses to describe the difficulty of transitioning research to applied work or workflows. The meteorological community has worked to develop best practices for reducing R2O burden, allocating resources to bridge the gap between research and practice. Currently, much of healthcare research and analytics, including AI models, are not being graduated into production/operations. This session will draw parallels with weather R2O and the challenges facing the healthcare industry. It will also outline healthcare R2O best practices based on the meteorology guidelines and share how stakeholders can leverage this to bridge the research to operations chasm in their organization.

Steven Labkoff, MD, FACP, FACMI, FAMIA, Global Head, Clinical and Healthcare Informatics, Quantori

Registry Science is the area of medical informatics and data science that intersects with medicine and epidemiology. In this presentation, Steven will talk about his work in building the world’s largest multi-data registry in oncology for multiple myeloma, enabling patients, clinicians and researchers to combine multiple types of data (Real-world data, genomics, and immunologics) to better understand their disease and subsequently to find more personalized treatments.

Sreenivas Reddy, Associate Vice President, Life Sciences and Services, Birlasoft

Drug Safety & Pharmacovigilance teams are constantly challenged with new safety norms, regulations and rise in new diseases. For example, with COVID-19 & its variants, the cases that must be processed are growing globally. Pharma companies are forced to re-imagine complete life-cycle of pharmacovigilance. Process Automation, AI & data science, and other digital technologies are providing innovative and compelling solutions in PV and Signal Detection to solve for these problems.

4:40 pm Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)
6:00 pm Close of Day

Thursday, May 5

7:30 am Registration Open and Morning Coffee (Plaza Level Lobby)



8:00 am

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
Nate Raine, Director Data Custodians, Lifebit
8:15 am

Leveraging Large-Scale Human Data to Advance and Accelerate Drug Discovery

Shankar Subramaniam, PhD, Distinguished Professor of Bioengineering; Professor of Chemistry, Biochemistry and Nanotechnology; Adjunct Professor of Cellular & Molecular Medicine, University of California at San Diego

Advances in genomics technologies have led to generation of massive amounts of human data. This has catalyzed new insights into cellular processes in the normal and disease state and facilitated the search for safe and effective medicines. The UK Biobank, All of US and TopMed initiatives are exemplars of this approach. We highlight examples from our lab where meaningful insights have been obtained advancing our understanding of disease biology and its pharmacological application.

9:30 am Coffee Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)



10:15 am Organizer's Remarks
10:20 am

Chairperson's Remarks

Christopher Southan, PhD, Competitive Intelligence Analyst, Data Sciences, Medicines Discovery Catapult
10:25 am

A Platform Democratizing Data, Analytics, and AI to Enable Development of Precision Therapies (Innovative Practices Award Winner)

Marc Flesch, PhD, Head, Dev, Genedata GmbH
Eike Staub, PhD, Senior Director/Head of Oncology Bioinformatics, Merck KGaA, Darmstadt, Germany

Precision medicine requires large interoperable datasets, high-performance analytics, and intense cross-functional collaboration for which digital technology is essential. This presentation highlights the value of an end-to-end big data platform for translational research, developed by Genedata AG in collaboration with Merck KGaA, Darmstadt. The platform supports all stages of drug R&D, from multi-omics NGS studies to digital pathology, from exploratory analyses for early drug research to statistics for late-stage clinical studies. We show how a data-driven culture can be supported by such a technical setup, through better data discoverability and sharing, thereby increasing research efficiency and productivity. We also demonstrate how secure cooperative work between internal and external expert analysts can be achieved: a key factor for leveraging the full potential of data. The presented solution, the outcome of a collaborative project, is today available as an off-the-shelf product, ready for other parties to join the community. By enabling end-to-end automation of complex R&D workflows, high-performance analytics, and full data governance, the software allows companies to maximize the ROI of their R&D data to facilitate the development of next-generation precision therapies.

10:55 am

The Essential Data Science Tools for Drug Discovery

Parthiban Srinivasan, PhD, Professor, Data Science and Engineering, Indian Institute of Science Education and Research

Data Curation, Data Management, Data Analytics and Machine Learning are the methodologies essential to work in the computer aided drug design environment. No longer the old-fashioned readymade software tools and knowing pull-down menus are sufficient to work in the new era of pharma research. This talk will present about the Python programming ecosystem and machine learning platforms essential for drug discovery research with case studies using Chembl data.

11:25 am

When Big Data Gets Messy: 40 Million Patent Compounds in PubChem

Christopher Southan, PhD, Competitive Intelligence Analyst, Data Sciences, Medicines Discovery Catapult

When IBM deposited their first 2.5. million open compounds automatically extracted from patent documents into PubChem in 2012, few would have predicted this would expand to over 40 million including contributions from SureChEMBL, WIPO, and Google Patents. Assessments, however, indicate this to be part novel SAR treasure trove and part chemical junk yard; thus, presenting users with the challenge of discriminating between the two. This talk will present data analysis results from inside PubChem that will assist in this task.



Inclusion and Diversity In Life Sciences: Producing Stronger Research and Sparking Innovation for Improving Health and Advancing Precision Medicine

Panel Moderator:
Kevin M. Ileka, PhD, Senior Manager, Business Development Competitive Intelligence, Bristol Myers Squibb Co.
Ari E. Berman, PhD, CEO, BioTeam, Inc.
Adrian Coles, PhD, Associate Director, Biostatistics, Bristol Meyers Squibb Co.
Lori Lennon, Founder and CEO, Thinkubator Media
Victoria S. Parker, PhD, Associate Manager, Research Program Management, Regeneron Pharmaceuticals
12:55 pm Session Break and Transition to Luncheon Presentation
1:05 pm Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own
1:50 pm Refreshment Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)



2:35 pm

Trends from the Trenches

Chris Dagdigian, Senior Director, BioTeam, Inc.
Matthew Trunnell, Data Commoner
Adam Kraut, Director Infrastructure & Cloud Architecture, BioTeam, Inc.
Anna Sowa, PhD, Senior Scientific Consultant, BioTeam, Inc.
Michelle Bayly, PhD, Senior Scientific Consultant, BioTeam, Inc.

Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, cloud, data science, and machine learning that are involved in supporting data-intensive science. In 2022, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session. 

4:10 pm Close of Conference

Register Early for Maximum Savings

Modern Data Platforms and Storage Infrastructure