Data Science and Analytics Technologies Image

The Data Science and Analytics Technologies track will explore popular data science and analytics tools, technologies, and languages that data scientists are using to gain extra insights and value from data. Presentations will explore becoming a data-driven organization, innovative approaches to data management and analytics, making real impact with data science, and applying data science and tools.

Monday, September 20

7:30 am Registration Open
8:00 am Recommended Pre-Conference Workshops*

Cambridge Healthtech Institute is pleased to offer morning and afternoon pre-conference workshops on Monday, September 20, 2021. They are designed to be instructional, interactive and provide in-depth information on a specific topic. They allow for one-on-one interaction and provide a great way to explain more technical aspects that would otherwise not be covered during the main conference tracks that take place Tuesday-Wednesday. 

*Separate registration required. See Workshop page for details.

9:30 am Break
9:45 am Recommended Pre-Conference Workshops*
11:15 am Enjoy Lunch on Your Own
12:45 pm Recommended Pre-Conference Workshops*
2:15 pm Break
2:30 pm Recommended Pre-Conference Workshops*
4:00 pm Session Break and Transition to Plenary Keynote


4:15 pm Innovative Practices Awards – Winners Spotlight

Pharma Executive Roundtable: Broadening the Data Ecosystem

Panel Moderator:
Lita Sands, Head, Life Sciences, Amazon Web Services

The Bio-IT World community employed creativity, problem solving, and technical ingenuity to weather 2020 and never was the work more important. Meanwhile, digitization has been broadening the horizons of new possibilities and initiatives that are driving innovation in the life sciences sector. While over the past year many pharmaceutical companies have seen an acceleration of digital transformation, there are still many that are unsure what to expect going forward. Digital transformation is now a strategic imperative, not a buzzword. Join our Pharma Executive Roundtable to discover how biopharma companies are broadening their digital strategies and capabilities to develop products and services to scale, streamline operations, and drive innovation in life sciences R&D. 

Ramesh V. Durvasula, PhD, Vice President & Information Officer, Research Labs, Eli Lilly & Co.
Michael Montello, Senior Vice President, R&D Tech, GlaxoSmithKline
Bryn Roberts, PhD, Senior Vice President & Global Head of Data Services, Roche
Holly Soares, PhD, Vice President & Head, Precision Medicine, Pfizer Inc.
Lihua Yu, Chief Data Officer, FogPharma
5:45 pm Welcome Reception in the Exhibit Hall with Poster Viewing
7:00 pm Close of Day

Tuesday, September 21

7:00 am Registration Open


8:00 am

Advancing Data and Technology with DATA Scholars: Combining Expertise to Answer Biomedical Data Questions

Allissa Dillman, PhD, Workforce Development and Community Engagement Director at the Office of Data Science Strategy, National Institutes of Health
Mohammad Ghassemi, PhD, Assistant Professor of Computer Science, Michigan State University
Rui Carlos Sa, PhD, Assistant Professor, UC San Diego
Judy Gichoya, PhD, Assistant Professor, Emory University

In 2019, the National Institutes of Health established the Data and Technology Advancement National Service Scholar Program. The program was designed to bring in experienced data and computational scientists and engineers to tackle challenging biomedical data problems with the potential for substantial public health impact. Today, we’ll hear from three of our DATA Scholars about the work they’ve been doing and discuss their experience at NIH. We’ll find out what’s been surprising, what their biggest success is so far, and where they see data science going at NIH.



Panel Moderator:
Allissa Dillman, PhD, Workforce Development and Community Engagement Director at the Office of Data Science Strategy, National Institutes of Health
Mohammad Ghassemi, PhD, Assistant Professor of Computer Science, Michigan State University
Rui Carlos Sa, PhD, Assistant Professor, UC San Diego
Judy Gichoya, PhD, Assistant Professor, Emory University
9:30 am Coffee Break in the Exhibit Hall with Poster Viewing
Joseph Pearson, PhD, Associate Director, Global Product Management Omicsoft, Digital Insights, QIAGEN

Single-cell analysis helps biologists and bioinformaticians reveal complex and rare cell populations, uncover regulatory relationships among genes, analyze and visualize gene expression differences among cell types. In this talk we will explore new tools for analyzing, interpreting and exploring scRNA-seq data and the underlying biology. We will also show how to integrate ‘omics datasets from different platforms to gain insights into the biology and molecular drivers of specific cell populations.

Josh James, Founder, CEO & Chairman of the Board, Domo

Join us as we explore how modern BI drives business transformation. From data centers to deserts, we’ll discuss what’s next for business intelligence. Then we’ll highlight how modern BI is helping teams and share how to act on the three core elements of modern BI: data agility, data literacy and intelligent action. You’ll learn ways to unlock the value of data throughout your business, and drive the transformation your business needs.

Jacob Aptekar, Senior Director, Product Management (Integrated Evidence), Medidata

In scenarios where clinical trials may not produce sufficient data on their own, such as in rare or serious conditions, historical clinical trial data can provide scientifically-rigorous evidence to fill data gaps. This session will cover the value of external data to your clinical development program, including examples from work with Cytokine Release Syndrome treated with CAR-T therapies where past trial data can help fill the gaps to:

  • Understand disease risk within specific patient groups
  • Analyze experimental treatments versus standard of care outcomes 
  • Generate evidence to inform product development and medical engagement
Collin Mechler, Director, Practice Leads, Domo
  • Discover how to leverage your existing investments to unlock data value, informing smarter business decisions and processes, at all stages of the value chain 
  • Learn how to make the move from reactive, traditional data and analytics approaches to proactive, data-driven decision-making, across your organization 
  • Explore how a data-driven culture helps to speed up processes, break down departmental silos, empower commercial leaders, and increase the impact of your people, systems, and processes 
12:15 pm Refreshment Break in the Exhibit Hall with Poster Viewing


Michael Stapleton, PhD, Managing Director, Life Sciences, Accenture
1:15 pm

How Digital Evolution and an Attitudinal Revolution are Re-Shaping the Future of the Life Sciences Industry

Nimita Limaye, PhD, Research Vice President, Life Sciences R&D Strategy and Technology, IDC

The world has rapidly transitioned to a model of disaggregated care and decentralized clinical trials, with a heightened focus on patient-centricity. Digital resiliency has become the priority and discretionary spend on R&D platforms has been delayed. Federated-learning models are fueling co-innovation and GPU-powered transformer models are accelerating drug discovery. Technology is enabling access and equity. The borders between healthcare and life sciences are blurring and real-world data is being leveraged to drive a precision medicine strategy.

1:50 pm

All of Us Research Program – Seeking To Advance Precision Health for All Populations

Joshua Denny, MD, MS, CEO, All of Us Research Program, National Institutes of Health

The All of Us Research Program launched May 6, 2018 and currently has over 375,000 participants who have contributed biospecimens, health surveys, and a willingness to share their EHR. Participants are partners in the program and receive research results from data they contribute, including genetic ancestry and traits. In the future, participants will also receive health-related genomic results from whole genome sequencing. In May 2020, the program launched the beta version of the Researcher Workbench. Once researchers register and are approved to use the workbench, they can access individual-level data and a suite of tools to analyze these data. All of Us is committed to catalyzing a robust ecosystem of researchers and providing a rich dataset that drives discovery and improves health.

2:30 pm Refreshment Break in the Exhibit Hall with Poster Viewing


3:05 pm

Enterprise-Wide AI Automation Innovation and Enabled Business Strategies

Srikanth Ramakrishnan, Director, Intelligent Automation & Analytics, Janssen Pharmaceuticals R&D IT

The convergence of several technology trends has accelerated progress in intelligent applications. The volume of data continues to double every couple of years. AI engineers now have massive compute power they can tap into, and they are devising ever more novel algorithms. How does an enterprise approach the key technology building blocks to enable scalable and agile applications which leverage embedded machine learning, including deep learning? The talk will describe some of the foundational technologies, structure and processes that could enable an enterprise to put AI to work for transforming their businesses.

3:35 pm

Generalizing Diversity: Machine Learning Operationalization for Pharma Research

Daniel Butnaru, PhD, Research Architect, Roche Diagnostics GmbH

More and more machine learning algorithms in pharma research are transitioning from a one-off scenario, where the model is built and ran few times, to repeated usage of the same model in daily research workflows. This shift significantly raises the bar on the quality and setup necessary to train and deploy ML models. With an ever increasing number of models we learned that leveraging an ML platform for operationalization has scale benefits.

4:05 pm Refreshment Break in the Exhibit Hall with Poster Viewing
4:35 pm

Gene Regulatory Network Inference As Relaxed Graph Matching

Rebekka Burkholz, PhD, Postdoctoral Researcher, Harvard T.H. Chan School of Public Health

Gene regulatory network inference is instrumental to the discovery of genetic mechanisms driving diverse diseases, including cancer. We cast this problem as graph matching and leverage its connection to machine learning to improve the state of the art in predicting the binding of transcription factors to promoter regions of genes.

5:05 pm

Learning from a Million Small Molecule Crystal Structures

Jeff Lengyel, PhD, Research and Applications Scientist, The Cambridge Crystallographic Data Centre

The Cambridge Structural Database (CSD) contains >1 million experimentally determined, expertly curated, small-molecule crystal structures. These structures have been collected from >400,000 authors over the last 55 years, resulting in a diverse dataset. These features make the CSD attractive for machine learning methods which benefit from large datasets. This talk will highlight some ways our researchers have utilized this structural data to gain insights in chemistry, biochemistry, and materials science.

5:35 pm Networking Reception in the Exhibit Hall with Poster Viewing
6:35 pm Close of Day

Wednesday, September 22

7:30 am Registration Open
8:00 am Interactive Discussions (Sponsorship Opportunity) or Morning Coffee

Interactive Discussions are informal, moderated discussions, allowing participants to exchange ideas and experiences and develop future collaborations around a focused topic. Each discussion will be led by a facilitator who keeps the discussion on track and the group engaged. For in-person events, the facilitator will lead from the front of the room while attendees remain seated. For virtual attendees, the format will be in an online networking platform. To get the most out of this format, please come prepared to share examples from your work, be a part of a collective, problem-solving session, and participate in active idea sharing. Please visit the website's Interactive Discussions page for a complete listing of topics and descriptions.

9:00 am Coffee Break in the Exhibit Hall with Poster Viewing


9:55 am

Understanding Heterogeneity in Disease Progression Journeys by Using Neural Network Models

Ye Jin Jenna Eun, PhD, Principal Data Scientist, Commercial Data Science, Johnson & Johnson Pharmaceutical R&D

In this study, we focused on patients diagnosed with psoriatic arthritis, an auto-immune disease that impairs the joint and bone function. In many cases, these patients were already suffering from a related auto-immune disease called psoriasis, which mainly impacts the skin, prior to developing a more debilitating psoriatic arthritis. We will present how a machine learning pipeline was developed to identify different progression pathways from a skin disease to a joint disease, and the potential impact of the methodology to inform clinical trial feasibility assessment for pharmaceutical R&D efforts, enhancing clinical trial efficiency and improving patient outcomes.

10:25 am

Improving Diagnosis Rates for Rare Disease

Ahsan Huda, PhD, Senior Director, Data Science, Pfizer Inc.

Wild-type transthyretin amyloid cardiomyopathy is a progressive, life-threatening, increasingly recognized but underdiagnosed cause of heart failure. Here we show that a random forest machine learning model can identify potential wild-type transthyretin amyloid cardiomyopathy using medical claims data. We show that the machine learning model performs well in identifying patients thereby providing a systematic framework to increase the suspicion of transthyretin cardiac amyloidosis in patients with heart failure.

10:55 am

Detection of Cognitive Concerns in Electronic Medical Records with Deep Learning

Sudeshna Das, PhD, Director Biomedical Informatics Core, Neurology, Massachusetts General Hospital

In order to identify patients with cognitive concerns in electronic medical records, we applied a deep learning based natural language processing (NLP) algorithm to unstructured clinician notes and compared the model’s performance to a baseline model that used regularized logistic regression with structured data (diagnosis codes and medication data). The deep learning model improved the AUROC from 0.79 to 0.90 and increased sensitivity of dementia detection from 0.59 to 0.79. 

Angela Bauch, PhD, Product Management, Biomax Informatics AG

AILANI is a semantic search enterprise solution for fast, easy and comprehensive knowledge discovery. It combines semantic modelling, ontologies, linguistics and AI algorithms to identify relevant information. Using ontology-based refiners enables fast and efficient retrieval of information about the clinical competitive landscape as well as identification and mapping of KOLs. By integrating organization-specific content AILANI leverages knowledge buried both in decade old data and data from news feeds and clinical trials.

Sanjay Saraf, Head of Data and Analytics Product Management, Benchling

The life science industry needs more than an ELN - customers can use Benchling to model their scientific workflows and then analyze their data. Join this session to learn how this is currently done, advantages and disadvantages to certain modeling approaches, and analytical methods on top of that data. Session will include a live demonstration with notional data and usage of Benchling Insights and the Benchling developer platform.

11:55 am Interactive Discussions (Opportunity Available)
1:10 pm Refreshment Break in the Exhibit Hall with Poster Viewing



Trends from the Trenches

Panel Moderator:
Kevin Davies, PhD, Executive Editor, The CRISPR Journal; Founding Editor, Bio-IT World

Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, cloud, data science, and machine learning that are involved in supporting data-intensive science. In 2021, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session. To stay connected with Trends from the Trenches updates after today and all year, sign up for BioTeam's newsletter here:

Chris Dagdigian, Senior Director, BioTeam, Inc.
Fernanda S. Foertter, PhD, Director of Applications, NextSilicon
Karl Gutwin, PhD, Director, Software Engineering Services, BioTeam, Inc.
Adam Kraut, Director Infrastructure & Cloud Architecture, BioTeam, Inc.
3:30 pm Refreshment Break in the Exhibit Hall with Poster Viewing


4:05 pm

Identification of Clinical Response Patterns Through Application of Unsupervised Machine Learning on Clinical Trial Time Series Data

Bethany F. Hyde, Data Scientist, Data Sciences, Janssen Pharmaceuticals, Inc.

We applied unsupervised machine learning on time series data collected from a clinical trial to uncover distinct patterns of patient treatment response. This talk will cover the challenges of using unsupervised machine learning on clinical trial data and the technical solutions to overcome these challenges, including data imputation, cluster optimization, secondary analysis, and clinical interpretation of results.

4:35 pm

Virtual Tumor Boards to Enhance Recruitment in Pharma Clinical Trials and Enhance Patient Experience

Subha Madhavan, PhD, Head of Data Science, Oncology R&D, AstraZeneca

Precision oncology still hinges on the application of cancer therapies that are designed for the ‘average patient’ as a ‘one size fits all’ approach. Since there is no ‘average patient’, targeted treatments are successful only in some patients. For PO to be effective, technologies are needed that will allow the rapid identification of key altered pathways in each patient’s tumor that are susceptible to molecularly targeted or immunological therapies and the presentation of these in a context-sensitive fashion at the point of clinical decision-making. Virtual MTBs (VMTBs) offer a solution to this issue to help drive cancer clinical trial recruitment.

5:05 pm

The Impact of Machine Learning-Directed Supportive Care During Cancer Radiotherapy: The SHIELD-RT Study (Innovative Practices Awards Winner)

Julian Hong, MD, Assistant Professor, Department of Radiation Oncology, University of California, San Francisco

The System for High-Intensity Evaluation During Radiation Therapy (SHIELD-RT) study ( NCT04277650; was a randomized controlled study that demonstrated that machine learning based on electronic health records can be implemented in the clinical setting to direct supplemental clinical evaluations during outpatient cancer radiotherapy and chemoradiation. This reduced acute care (emergency visits and hospitalization) in high risk patients from 22.3% to 12.3%. Routine implementation of this system at the Duke Cancer Institute is underway.

5:35 pm Close of Conference

Purchase on Demand