Track 1: Modern Data Platforms and Storage Infrastructure

Solutions to Enable Discovery and Ease Data Bloat

May 4 - 5, 2022 ALL TIMES EDT

The Modern Data Platforms and Storage Infrastructure track will explore data platforms and storage infrastructure solutions that enable discovery and solve the data bloat problem. Speakers will discuss applications, platforms and tools, and scalable solutions.

Tuesday, May 3

7:00 am Registration Open (Plaza Level Lobby)
8:00 am Recommended Pre-Conference Workshops and Symposium*

On Tuesday, May 3, 2022 Cambridge Healthtech Institute is pleased to offer nine pre-conference workshops scheduled across three time slots (8:00-10:00 am, 10:30 am-12:30 pm, and 1:45-3:45 pm) and a Symposium from 8:25 am-3:45 pm. All are designed to be instructional, interactive and provide in-depth information on a specific topic. They allow for one-on-one interaction and provide a great way to explain more technical aspects that would otherwise not be covered during the main conference tracks that take place Wednesday-Thursday.

*Separate registration required. See Workshop page and Symposium page for details.

3:45 pm Session Break and Transition to Plenary Keynote



4:00 pm

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
4:05 pm Innovative Practices Awards
Mike Tarselli, PhD, Chief Scientific Officer, TetraScience
4:30 pm

Ask What IT Can Do for Bio...and What Bio Can Do for IT

George M. Church, PhD, Robert Winthrop Professor, Genetics, Harvard Medical School

IT for Bio: In May 2021, one haploid human genome (3.055 billion bp) was sequenced completely, but zero diploid. We have 7.7 billion diploid humans yet to be sequenced and correlated with their environments and traits in the Personal Genome Project. Plus, at least one genome from each of over 8.7 million eukaryotic species in the Earth Biogenome project. Plus, monitoring pathogenic and commensal bacteria, allergens, and viruses in the BioWeatherMap. Plus, ancient DNA. We are counting RNA molecules per cell in most (or all) cell types in humans, mice, and many other species throughout development and connectome (with imaging resolution up to 20 nm).   

Bio for IT: Reading and writing DNA has improved exponentially in cost (at least 60 million fold) and is increasingly used for storing non-biological data. The record for editing DNA in vivo is now 24,000 edits per cell and for storing data in vivo is about 1 terabyte per mouse. Enormous chemical and biological 'libraries' can perform 'Natural Computing' for tasks far beyond current von-Neumann silicon and quantum computers. The combination of these – machine learning + megalibraries (ML-ML) is already having commercial impact (e.g. Nabla, Manifold, Dyno, Patch). 

5:45 pm Welcome Reception in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)
7:00 pm Close of Day

Wednesday, May 4

7:00 am Registration Open and Morning Coffee (Plaza Level Lobby)



8:00 am

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
Zachary Powers, Chief Information Security Officer, Benchling
8:15 am

Accessing and Securing the Data that Drives Breakthroughs

Allison Proffitt, Editorial Director, Bio-IT World
Rachana Ananthakrishnan, Executive Director, Globus, University of Chicago
Ari E. Berman, PhD, CEO, BioTeam, Inc.
Jonathan C. Silverstein, Chief Research Informatics Officer & Professor, Biomedical Informatics, University of Pittsburgh
Rebecca F. Rosen, PhD, Director, Office of Data Science and Sharing, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health

Life sciences research is generating massive amounts of data that should be accessible to collaborators and colleagues to enable breakthrough discoveries. However, ensuring sensitive data are shared securely in a manner that protects patient privacy and complies with myriad regulations is a daunting task, which often slows the pace of research. Our panel of leading practitioners will share insights on the challenges and best practices of managing protected research data.

9:30 am Coffee Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)



10:15 am Organizer's Remarks
10:20 am

Chairperson's Remarks

Ari E. Berman, PhD, CEO, BioTeam, Inc.
10:25 am

Cloud vs. On-Prem Architecture for Science

Ari E. Berman, PhD, CEO, BioTeam, Inc.

Analytics in life sciences and healthcare today is extremely data-intensive. Most research projects require extensive collaboration and utilize data from many locations generated by a large number of researchers. As IT infrastructure becomes harder to manage, science organizations are trying to decide between going to the Cloud or investing in on-premises resources for scientific computing. Unfortunately, the decision isn't that straightforward. There are many important factors to consider when deciding where to invest your time and money for your science. During the pandemic, and with a significant and persistent supply chain problem for purchasing hardware, many organizations have decided to make more use of the Cloud. However, many have started to see rising costs and lower availability due to that decision. In this presentation, we'll review the use cases for scientific computing (types of analytics, data storage, hardware capabilities, collaboration), the primary personas in most organizations that use and manage the technology, and the consequences and outcomes of making either the choice of Cloud or on-premises architectures as a solution to the needs of your organization.

10:55 am

Secure and Scalable Synthesis Planning Automation

Agnes Meyder, PhD, Scientific Solution Engineer, Roche
Yi Lin, PhD, Head of Discovery Informatics, Senior Principal Scientist, Digitalization, Roche R&D Center (China) Ltd.

Molecular retrosynthesis is mainly based on assumptions, biased knowledge and experience; therefore, often slow and expensive. The Synthesis Planning Automation Platform uses artificial intelligence to identify the most probable synthetic routes for molecules with a high predicted probability of technical success). We will show you how the tool, enhanced with inhouse reaction data, enables improved decision making by the scientists. As a second part, our talk will highlight how the cloud was leveraged in a secure and scalable way in a gitOps approach to enable our scientists to gain the most benefits.

11:25 am

Accelerating Cancer Research through a Highly Integrated and Harmonized Data Ecosystem

Vasileios Stathias, PhD, Lead Data Scientist, Molecular & Cellular Pharmacology, University of Miami

This talk will highlight the end-to-end research platform to support the significant increase of sequencing and other omics data types generated at the Sylvester Comprehensive Cancer Center. Using a state-of-the-art hybrid cloud architecture, the Sylvester Data Portal (SDP) provides researchers with secure data storage, FAIR data management, intuitive data access and reproducible data processing workflows. SDP has been built with a focus towards both experimental and computational researchers by using an intuitive user-interface and well-documented API’s.

Robert Murphy, Director of Product Marketing, WEKA

Organizations are looking for competitive advantage through digitalization by 10x faster processing of 10x bigger datasets. Traditional storage cannot meet rapidly escalating requirements for performance AND scale across on-premises and multi-cloud data centers. Learn how the WEKA Data Platform accelerates next-generation sequencing, Cryo-EM microscopy, and bio-imaging data pipelines all while lowering the cost of research and keeping your data secure.

12:25 pm Interactive Discussions

Interactive Discussions are informal, moderated discussions, allowing participants to exchange ideas and experiences and develop future collaborations around a focused topic. Each discussion will be led by a facilitator who keeps the discussion on track and the group engaged. For in-person events, the facilitator will lead from the front of the room while attendees remain seated. For virtual attendees, the format will be in an online networking platform. To get the most out of this format, please come prepared to share examples from your work, be a part of a collective, problem-solving session, and participate in active idea sharing. Please visit the Interactive Discussion page on the conference website for a complete listing of topics and descriptions.

Jeff Denworth, CMO and Co-founder, VAST Data

How Big is Your Data?

  • Join an open discussion about the state of data storage management in life sciences
  • Come share your horror stories, your victories and your best practices
  • Take a look into the future to discuss everything from AlphaFold to DMA storage to quantum computing
12:55 pm Session Break and Transition to Luncheon Presentation
Kate Vanness, MS, Senior Product Specialist, Digital Solutions, Thermo Fisher Scientific

Complete laboratory orchestration is critical in moving organizations forward on their digital transformation journey.  Although the goal is the same, each organization has its own complex and tailored scientific ecosystems, as well as its own unique set of processes and challenges. Leveraging innovative digital tools, technology, and software solutions will create a seamlessly connected ecosystem that will ultimately eliminate those challenges, enhance operations and improve efficiency throughout.

1:50 pm Refreshment Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)


2:40 pm

A Modern Framework for Data Discovery and Collaboration

Vas Vasiliadis, Chief Customer Officer, Globus, University of Chicago

Research enterprises are generating orders of magnitude more data in relatively short timeframes. As a result, describing data for downstream discovery and making the data accessible--with appropriate access controls--to partners and collaborators is increasingly challenging. Ad hoc methods cannot keep up and place undue burden on researchers and system administrators, but data portals combined with fast networks are becoming more prominent as a means of enabling access to large datasets. We will demonstrate how the widely used Django web framework, integrated with the Globus platform, can improve data discoverability and collaboration.

3:10 pm

Approaches to Managing Digital Health Data

Ardy Arianpour, CEO & Co-Founder, Seqster

Seqster provides instant interoperability to retrieve and harmonize clinical, wearable, and genomic data from distinct sources. The SeqsterOS automates real-time Real World Data (RWD) collection while improving the participant journey. Life science enterprises benefit from improved study participant engagement and retention. Study participants and researchers benefit from longitudinal health information essential for long-term observational studies and effectively conducting health economics & outcomes research (HEOR). SeqsterOS is FDA 21 CFR Part 11 Compliance for drug submissions. The operating system includes de-identification and tokenization of data e-consent, eCOA and ePRO for any clinical trial or study. Learn more at

3:40 pm

Developing the Next Generation of Digital Therapeutics

Jennifer Gentile, PsyD, Senior Vice President, U.S. Clinical Innovation, ieso

This talk will address the real-world impact of healthcare analytics in mental health care and the potential to provide more targeted and effective treatments. Dr. Gentile will share the work of ieso and how their use of big data has resulted in improvements in many aspects of mental health care including but not limited to precision healthcare, diagnosis, triage, risk identification, appropriate treatment plan, attrition, and recovery rates. As well, she will discuss how big data is helping ieso to develop the next generation of digital therapeutics. 

Mark Weston, CEO, Netrias

Despite advancements in ontologies and workflow tooling, data integration and metadata harmonization remain a significant challenge from the earliest stages of discovery all the way to production. We present the Active Discovery Engine (ADE), a Netrias platform that uses machine learning techniques to perform AI-assisted data curation and integration. We will showcase examples that highlight term alignment, automated data joining, and custom analytics integration.

4:40 pm Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)
6:00 pm Close of Day

Thursday, May 5

7:30 am Registration Open and Morning Coffee (Plaza Level Lobby)



8:00 am

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
Nate Raine, Director Data Custodians, Lifebit
8:15 am

Leveraging Large-Scale Human Data to Advance and Accelerate Drug Discovery

Shankar Subramaniam, PhD, Distinguished Professor of Bioengineering; Professor of Chemistry, Biochemistry and Nanotechnology; Adjunct Professor of Cellular & Molecular Medicine, University of California at San Diego

Advances in genomics technologies have led to generation of massive amounts of human data. This has catalyzed new insights into cellular processes in the normal and disease state and facilitated the search for safe and effective medicines. The UK Biobank, All of US and TopMed initiatives are exemplars of this approach. We highlight examples from our lab where meaningful insights have been obtained advancing our understanding of disease biology and its pharmacological application.

9:30 am Coffee Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)



10:15 am Organizer's Remarks
Ryan Magruder, Solutions Architect, Sales, Rescale

Modernizing for the New Era of Life Science Innovations

Lita Sands, Head, Life Sciences, Amazon Web Services
Mike Tirozzi, Senior Vice President, Chief Information & Data Officer, Vertex Pharmaceuticals, Inc.
Mike Tarselli, PhD, CSO, TetraScience, Inc.
Anna Berg Åsberg, Global Vice President R&D IT, AstraZeneca
Bill Goodman, Senior Director, Product Management, Digital Science, ThermoFisher

Over the last decade, life sciences organizations have pushed the envelope on what’s possible—from running tens of billions of tests in a single day, to using Alexa-enabled lab equipment to reduce errors, to using AI to develop more targeted clinical trials for precision medicine.  While these innovations touch different parts of the value chain, they each build on a common foundation of cloud modernization. This session will explore the underlying infrastructure behind some of the most innovative breakthroughs in the life sciences industry, and break down how leading life sciences organizations approached their cloud migration journey.  Hear from leaders from AstraZeneca, Vertex Pharmaceuticals, TetraScience, and more to learn how their organizations created an infrastructure for innovation, put technology to work to remove the undifferentiated heavy lifting and focus on what matters most to their organizations, and how the cloud is changing their mission and the type of talent they are attracting. 

Rebecca Carazza, Ph.D, Executive Director, Information Systems, Nimbus Therapeutics
Abhay Kini, Director, Life Sciences, Egnyte

Access to quality data is one of the most important accelerators of growth for emerging biotech.The resulting analysis and insights are enabling a golden age for research and therapy development as modern drug development teams are leveraging technology to unlock scientific data in new ways. Learn how digital workflow automation and the development of NIMBEye, Nimbus’ cloud computing environment has accelerated the deployment of CRO data from days to minutes.

Aniket Deshpande, Senior GTM Specialist, Healthcare and Life Sciences, Amazon Web Services
Joachim de Schrijver, Product Owner, Agilent
Nate Raine, Director, Data Custodians, Lifebit
Eric Dawson, Bioinformatics Scientist - AI, Enterprise Products, NVIDIA

Genome sequencing pipelines and processing large genomic datasets can become cumbersome, limiting genomics adoption for clinical application and the ability for improvement or scale. Turning to transformative cloud-based technologies can help organizations scale their genomic solutions, while optimizing performance and costs. Learn how biopharma, clinical care and health outcomes, and population genomics are assisted by a GPU-accelerated computational genomics application framework, and high-performance, flexible, scalable cloud infrastructure.

12:55 pm Session Break and Transition to Luncheon Presentation
Doug Ricketts, Technical Account Manager, Healthcare Account Management Team, Synology

Even the most secure systems can fall victim to attack or disaster, and when security fails, a complete recovery plan is essential to get your organization back on its feet. Join us to learn about building a robust disaster recovery plan for physical devices, servers, SaaS, and more.

Ryan Magruder, Solutions Architect, Sales, Rescale

Deploying compute intensive workloads on the cloud can be challenging, especially when trying to unify workflows for various cloud providers. Each platform has their own tools for compute, storage, and security which require specific knowledge and skills to build frameworks around. In this session learn how Rescale provides a unified platform for high performance computing built for a multi-cloud environment, as well as how to leverage these tools to accelerate your simulations.

2:05 pm Refreshment Break in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)



4:10 pm Close of Conference
2:35 pm

Trends from the Trenches

Chris Dagdigian, Senior Director, BioTeam, Inc.
Matthew Trunnell, Data Commoner
Adam Kraut, Director Infrastructure & Cloud Architecture, BioTeam, Inc.
Anna Sowa, PhD, Senior Scientific Consultant, BioTeam, Inc.
Michelle Bayly, PhD, Senior Scientific Consultant, BioTeam, Inc.

Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, cloud, data science, and machine learning that are involved in supporting data-intensive science. In 2022, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session. 

4:00 pm

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World

Register Early for Maximum Savings

Modern Data Platforms and Storage Infrastructure