Track 2: Data Computing

There is an increased demand in computing power from life science researchers and scientists tackling big data issues. To do their work, their storage and infrastructure must be able to scale to handle billions of data points and files efficiently. The Data Computing track will explore data computing resources and application deployment tools that are needed to process computational workflows and drive automation, advance analytics capabilities, reproduce software deployment, maximize application performance, and drive broad organizational decision processes.

Final Agenda

Tuesday, April 16

7:00 am Workshop Registration Open and Morning Coffee

8:0011:30 Recommended Morning Pre-Conference Workshops*

W4. AI for Pharma

12:304:00 pm Recommended Afternoon Pre-Conference Workshops*

W9. Research Project Management

* Separate registration required.

2:006:30 Main Conference Registration Open


5:007:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, April 17

7:30 am Registration Open and Morning Coffee


9:45 Coffee Break in the Exhibit Hall with Poster Viewing



10:50 Chairperson’s Remarks

Shweta Maniar, Global Client Executive, Google Cloud

11:00 Reaction Explorer

Raquel Dias Miranda Hoggett, Small Molecule Workflow Service Manager, F. Hoffmann-La Roche

The presentation will discuss how Roche has addressed the challenges of navigating reactions linked by synthetic route or Intellectual Property space. The following themes will be covered: the Reaction Explorer workflow tool for capturing, processing and mining reaction information; key analysis functionalities such as synthesis trees, reaction component property comparisons, key intermediates & precursors identification; facilitating patent writing, condition optimization and enabling synthesis proposals.

11:30 A Fully Integrated Platform for Pharmacology in vivo Study Data Management

Lian Shen, Software Engineer, Five Prime Therapeutics

In vivo studies are an indispensable component of the drug discovery process. However, current study workflow management and data capture mostly involve manual data entries and spreadsheet manipulations, which can be both cumbersome and error-prone. We developed an integrated software platform to streamline end-to-end in vivo study workflow management, from protocol design and initiation, to data collection, analysis, and reporting. By leveraging both in-house development and custom commercial software (i.e. Spotfire, SharePoint), the platform provides an accurate and comprehensive record of events for all in vivo studies at any point in time, enabling users to access the results in real-time for scientific decision making. Fully integrated measurement devices (e.g. balance, caliper, RFID reader, various scanners) allow seamless data collection with no manual user entries, thus increasing operational efficiency and ensuring data integrity. Our integrated platform provides full visibility of pharmacology studies while promoting data integrity and workflow efficiency. We will present our novel approach to capturing Pharmacology workflow data which includes a flexible database schema design, a modular and extensible utilization of workflow action templates, a fully-integrated instrument measurement recording implementation, and a reliable system for saving workflow data independent of network connectivity. We hope that presenting the design behind our different platform tools and components will provide useful insights for future design considerations of our audience.

12:00 pm UK Biobank – Delivering Insights for Reverse and Forward Translation
Jason Tetrault, Global Head Data Engineering and Emerging Technologies, Takeda
Sandor Szalma, PhD, Global Head, Computational Biology, Takeda San Diego, Inc.

12:30 Session Break

12:40 Luncheon Presentation: Intelligent Automation of Biomedical Data Processing Using Metadata Enabled Analytics

Georges Heiter, CEO, Databiology

Today’s research uses highly specialized siloed data and analytics with poor reproducibility. Traditional monolithic architectures prevent easy modernization of R&D compute environments. Learn about a new frontier in data computing, how a platform that uses a ubiquitous metadata approach combined with a modern microservices architecture enables automation and intelligently assists researchers in obtaining insights from heterogeneous datasets.

1:10 Session Break


1:50 Chairperson’s Remarks

Vas Vasiliadis, Chief Customer Officer, Globus, University of Chicago

1:55 Model of Efficiency: How Automating Data Exchange Can Improve the Flow of Complex Information between Partners in a Distributed Resource Model

Rebecca Carazza, PhD, Director, Research Informatics, Nimbus Therapeutics

As a “virtual” biotechnology company, Nimbus Therapeutics’ operations are entirely enabled through partnering with a wide array of academic and contract research organizations across the globe. Within this model, the coordination of data transfer and integration between multiple partners has historically been a challenge, creating a bottleneck that negatively affected the speed and scalability of the organization.  An automated workflow solution was designed and deployed using a combination of AWS Lambda, Egnyte Connect, Jira Software and ACAS implementations. Now, data exchange is less error prone, requires approximately 50% less manual intervention, and the content is available to scientific decision makers faster.

2:25 Automating Workflows in Bioscience Research: Approaches and Examples

Vas Vasiliadis, Chief Customer Officer, Globus, University of Chicago

Managing data in large-scale, distributed life sciences research demands capabilities beyond those provided by web-based tools. Researchers increasingly require automation to cope with large volumes of repetitive tasks--replicating data across systems at multiple institutions, staging data captured from an instrument for analysis, and enabling access to reference data for a large user community--which cannot be handled at scale by ad hoc methods. In addition, research communities and labs often have stringent requirements for secure portals to simplify data sharing and collaboration. In this talk, we will describe the Globus research data management platform and illustrate how it can be used to automate research data flows. Through hands-on examples and references to implementations in bioscience projects, we will demonstrate how to leverage Globus to provide researchers with scalable, automated data management capabilities. We will also describe how Globus REST APIs may be combined with high-speed networks to provide a research data platform on which developers can create entirely new classes of scientific applications, portals, and gateways in the life sciences.

OpenEye 2:55 Virtual Screening of 1.43 Billion Molecules

Mark McGann, Principal Developer, OpenEye Scientific

Cloud computing provides massive computational power to virtual-screening using either docking or ligand-based-methods. We present the docking into Purine Nucleoside Phosphorylase of 1.43 Billion molecules in the Enamine dataset. This calculation, on OpenEye’s cloud-based-platform Orion, took 24 hours. The ligand-based virtual of the same database on the same target took 1-hour. We will discuss the results/benefits of these screens.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing, Meet the Experts: Bio-IT World Editorial Team, and Book Signing with Joseph Kvedar, MD, Author, The Internet of Healthy Things℠ (Book will be available for purchase onsite) 


4:00 How to Be “In the Flow”: Combining Business Process and Data Flows for End-to-End Automated Laboratory Workflows

Andreas Steinbacher, PhD, R&D Informatics and Machine Learning/Artificial Intelligence Leader for Lab Automation, Roche Innovation Center Munich

Currently the automation-level of end-to-end laboratory workflows in the life science research space is rather low. This results from a strong focus on automating single workflow steps with separated instruments/integrations, and on the other hand, from the need of flexible (re-)configuration of laboratory workflows in a research environment. Business process modelling and management tools are successfully applied in other industries and are also proposed as a solution for life science automation. However, typically they lack to track the corresponding data flow. This talk aims at identifying the essential requirements of an end-to-end laboratory workflow automation and how business process management tools can help to achieve this goal.

4:30 End-to-End Sample Tracking in the Laboratory Using a Custom Internet of Things Device

William Neil, Digital Capability Manager, Solution Engineering and Delivery, Bristol-Myers Squibb

We describe a custom Internet of Things (IoT) device used for tracking barcoded containers end to end in a high-throughput analysis and purification laboratory. Our IoT device fills an important gap that previously prevented us from fully tracking barcoded sample containers through manual steps in a multistep workflow. The custom device reads container barcodes and sends a small amount of data to our back-end data systems. Once data have been received and processed, users are alerted to any system responses via aural and visual feedback. We believe that the model for our device will facilitate simple and rapid deployment of IoT to the broader laboratory community.

Google-Cloud-New 5:00 Genomic Analyses on Google Cloud Platform

Andrew Moschetti, Solutions Architect, Healthcare & Life Sciences, Google Cloud

Using Google Cloud Platform and other open source tools such as GATK Best Practices and DeepVariant, learn how to perform end-to-end analysis of genomic data. Starting with raw files from a sequencer, progress through variant calling, importing to BigQuery, variant annotation, quality control, BigQuery analysis and visualization with phenotypic data. All the datasets will be publicly available and all the work done will be provided for participants to explore on their own.


5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 18

7:30 am Registration Open and Morning Coffee


9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced


10:30 Chairperson’s Remarks

Mahmood Mohammadi Shad, PhD, Scientific Software Engineer, FAS Research Computing, Harvard University

10:40 Accelerating Neuroscience Research Utilizing Advanced Research Computing Resources

Mahmood Mohammadi Shad, PhD, Scientific Software Engineer, FAS Research Computing, Harvard University

Working on cutting-edge research on discovering brain mysteries requires dealing with a large amount of data processing and interacting with advanced software tools in order to facilitate the research. Research Computing at Harvard University provides the storage/compute infrastructure as well as research software engineering (RSE) services to different labs within the Center for Brain Science (CBS). Three different projects in Ölveczky Lab at CBS will be presented to show how advanced RSE combined with modern storage/compute resources is shaping the research on the brain. The projects include a high-throughput operant-conditioning (OpCon) software for behavioral study, spike sorting of long-term brain-wave recording to discover neural dynamics, and motion capture analysis to classify different kinds of movements and behavior. I will provide details of the architecture and software that have been designed to handle some 10 Gb/s of data streams. In all studies, the neural recordings of the subject brain are analyzed along with behavior details using advanced machine learning, deep learning, and data mining techniques to discover neural firing patterns associated with different behaviors. The findings help neuroscientists and neurosurgeons to come up with ways to treat the neurological disorders. The presentation provides insights into advanced research computing tools (both storage/compute infrastructure and research software engineering) that are helping neuroscientists to perform research on the brain. It emphasizes challenges that researchers are dealing with every day in terms of research software engineering, modern storage and compute solutions and other technical aspects.

11:10 Improving Laboratory Ordering Patterns and Patient Safety Using Integrated Electronic Health Record (EHR) and Laboratory Information System (LIS): The Brigham and Women’s Hospital Experience

Milenko Tanasijevic, MD, MBA, Vice Chair for Clinical Pathology and Quality, Department of Pathology, Brigham and Women’s Hospital and Dana-Farber Cancer Institute; Director of Clinical Laboratories, Brigham and Women’s Hospital; Associate Professor of Pathology, Harvard Medical School

We report on the features and advantages of an integrated EHR / LIS laboratory ordering system in a large, academic-based tertiary medical center. The system consists of an EHR-based computerized physician order entry system, positive patient identification system for both nursing and phlebotomy staff, order communication to the LIS and an automated, robotic routine chemistry and hematology systems with result auto-filing capabilities. The various components were introduced over a period of five years engaging multidisciplinary laboratory, informatics, nursing and clinical teams. The physician order entry system is EPIC-based and features a number of rules and reminders for redundant tests to optimize laboratory ordering. In addition of the Sunquest-based collection manager for phlebotomy staff, we also developed a specimen collection module for our nurses, reflecting their different workflow. The Sunquest LIS interacts with the robotics automated systems through vendor-supported middleware. We present a detailed timeline of the system’s development along with benefits for patient care and lessons learned during the process.

11:40 Building an Image Analytics capabilities platform for Pharmaceutical Science (PS) - Pathology    

Michel Petrovic, Senior Scientist/Business Analyst, pRED Informatics, Roche Pharmaceutical Research and Early Development (pRED), Roche Innovation Center Basel

The PS pathology group provides toxicology and investigative pathology support for all pRED DTAs and functions with increasing image analysis projects. A complete data management and image analytics solution is requested to fulfill the following: 1. Tools allowing for slide visualization and annotation in a collaborative manner (Digital Pathology Slide Management), 2. Integration of analytics software solutions for both the primary image analysis and data mining, and 3. Ability to launch image analysis solutions from the slide viewer. This talk will discuss the flexibility to independently develop fit-for-purpose image analysis solutions to support the portfolio. I will also talk about proper integration with the TiMAP-LIMS workflow (Tissue Management and Analytics Platform) component.  

12:10 pm Enjoy Lunch on Your Own (Lunch Available for Purchase in the Exhibit Hall)

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing


1:55 Chairperson’s Remarks

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

2:00 PANEL DISCUSSION: High Performance Consultancies


Chris Dwan, Senior Technologist and Independent Life Sciences Consultant


Tanya Cashorali, CEO, Founder, TCB Analytics

Aaron Gardner, Director of Technology, BioTeam, Inc.

Eleanor Howe, PhD, Founder and CEO, Diamond Age Data Science

An organization must learn and understand the value of why, when and how to use a consultancy. Highly trained and skilled professional experts gather to discuss their role in leading and managing projects for organizations to help them achieve goals. They will discuss a variety of themes including the best kinds of projects to hire a consultancy for, the timeline of when an organization should hire a consultant vs. full time staff, and big challenges on the horizon. The session will feature short podium presentations, followed by a moderated Q&A panel with attendees. The topic of hiring a consulting company came up in the data science plenary keynote at Bio-IT 2018. We want to spend time at Bio-IT 2019 exploring this topic in finer detail.

3:20 KEYNOTE PRESENTATION: Trends from the Trenches 2019

Chris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.

The “Trends from the Trenches” in its original “state of the state address” returns to Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data intensive science.

We are looking for your feedback on the key questions, problems, issues you are facing. We will incorporate the feedback into Chris’ talk. Complete this short survey anonymously:

4:00 Conference Adjourns

Exhibit Hall and Keynote Pass

Data Platforms and Storage Infrastructure