Track 13: Data Transfer

Time and money spent transferring data are hampering the ability of data-intensive scientific researchers to share data and generate new knowledge. Whether migration to cloud or across campus, flexibility and reliability, as well as speed, security, and scalability, must be considered. Through case studies, the Data Transfer track presents both hardware and software enterprise data management solutions that facilitate high-speed data transfer to enable productivity and foster collaboration.

Final Agenda

Tuesday, April 16

7:00 am Workshop Registration Open and Morning Coffee

8:0011:30 Recommended Morning Pre-Conference Workshops*

W7. Blockchain 101

12:304:00 pm Recommended Afternoon Pre-Conference Workshops*

* Separate registration required

2:006:30 Main Conference Registration Open


5:007:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, April 17

7:30 am Registration Open and Morning Coffee


9:45 Coffee Break in the Exhibit Hall with Poster Viewing

cambridge complex

10:50 Chairperson's Remarks

Hongliang Tang, Senior Director and Chief Architect, Huawei American Storage Research Lab, Futurewei Technologies, Inc.

11:00 DB4Sci, Open Source Database as a Service (DBaaS) for On-Prem and Cloud

John Dey, HPC Systems Engineer, Scientific Computing, Fred Hutchinson Cancer Research Center

Cloud based databases as a service (DBaaS) have extremely simplified database management. We can create database instances using best practice configuration including backup and DR plans with a single push of a button. However, databases are sensitive to latency and cloud-based databases cannot be used effectively from on-prem. Supporting Postgres, MongoDB, MariaDB/MySQL and Neo4J graph databases DB4Sci is the ideal DBaaS solution for on-premise and multi-cloud deployments that supports high performance backup to cloud storage. The audience will learn how one can deploy a very robust and fast database service with a simple architecture. At its core db4sci is a rather simple Python-Flask app that uses docker commands to manage database instances in containers. The simplistic architecture is intentionally not designed around enterprise features such as High-Availability (HA) and business continuity. Instead we focus on our ability to recover from disasters (DR). Data is backed up to cloud storage at regular intervals and can be restored by an administrator or by the end user, for example on a server in a different cloud. We can demonstrate that users can be back in business within a few minutes after a major failure.

11:30 Data Centralization for Any Lab, Any Equipment, Any Software

Charles Fracchia, Founder and CEO, BioBright
Jarrod Medeiros, Director of Informatics and IT, Casma Therapeutics

It’s all too easy to end up with cloud infrastructures that mirror the shortcomings of local data management. In this talk, we will present how carefully designed software can make data available seamlessly, removing the need for scientists to dig through disparate systems to find what they need and analyze it. We will present a new model that allows data centralization and cloud-based data analysis while minimizing the burden on the scientist. We will share concrete use-cases for how to effectively migrate to a data-centric workflow that takes advantage of cloud storage and analytics. Attendees will leave with five steps to help plan and evaluate their approach to cloud data management: 1) examining the current flow of data, 2) finding out the scientific needs for data, 3) calculating storage needs for seamless scale up, 4) connecting the dots between storage and analysis, and 5) desgning for future integrations/growth.

12:00 pm Architecting for Success with Machine Learning Data Platforms for Image Analysis and Precision Medicine

William Beaudin, Director of Solutions Engineering, DDN Storage

Aspects of precision medicine, including automated image analysis or mining patient data to better target therapies, leverage AI & deep learning. While early training data fits in-node, successful approaches attract more data. Forward thinking organizations adopt scalable architectures, the unprepared fall behind. We review key considerations for machine-learning platforms ensuring effortless scaling, deeper insights and a shorter path to value.

12:15 Internet2: Leveraging Distributed Resources to Speed Discovery

Dan Taylor, Director, Business Development, Internet2


Few Life Sciences organizations take advantage of the vast resources available to R&D organizations for continuous innovation and keeping pace with big data. This session will discuss the infrastructure underlying collaborations that use private, academic and public resources – including commercial cloud and supercomputing centers storage and processing - to maximize options and speed discovery. 

12:30 Session Break

12:40 Luncheon Co-Presentation I: Accelerating Life Sciences Workflows Using Software Defined Storage

David Hiatt, Director, Product Marketing and Business Development, WekaIO

Aaron Gardner, Director, Technology, BioTeam

In this presentation we will compare the results of Cryo-EM and genomic pipelines run on a traditional storage architecture to those run on a modern scale-out storage system. See how the modern scale-out system can meet the mixed workload challenges of life sciences and outperform the storage system for the largest supercomputer in the world.

IBM_Blue 1:10 Luncheon Presentation II: Accelerate Precision Medicine with High Performance Data and AI

Frank Lee, PhD, Global Industry Leader for Healthcare and Life Sciences - IBM Systems

Get your data and apps ready for precision medicine and research in the multicloud era, to derive faster insights with high performance data and AI architecture. Join Frank Lee, PhD, Global Industry Leader for Healthcare and Life Sciences, as he presents real-life use cases and best practices for high performance genomics and imaging with deep learning that will help you deliver new records for speed and scale, cost efficiencies, collaboration and ease of use.

1:40 Session Break

cambridge Complex

1:50 Chairperson’s Remarks

Brigitte Raumann, Product Manager, Globus, University of Chicago

1:55 Achieving Compliant Collaboration: Securely Managing Protected Data to Accelerate Discovery

Brigitte Raumann, Product Manager, Globus, University of Chicago

Researchers working with protected data -- such as HIPAA-regulated data and controlled unclassified information -- face many challenges in managing this data and sharing it with colleagues. Meeting compliance requirements is complicated, and investigators must often either slow their process to address this burden, or resort to using distilled, de-identified data instead. With higher assurance levels provided by Globus, the leading research data management service, users can optimize their protected data environments by integrating secure, scalable data management capabilities into existing workflows and applications. attendees will learn about features and enhancements to the Globus service that make it possible to manage protected data in a compliant manner, and will gain an understanding for how their organization can benefit from these features.

2:25 Research, Privacy and Risk

Kris Torgerson, Chief Information and Privacy Officer, Oak Ridge National Laboratory

Pure-Storage 2:55 Solving Genomic Data Privacy in the Age of AI

Esteban Rubens, Global Principal, Enterprise Imaging Healthcare, Pure Storage

Health data protection is of paramount importance, with all stakeholders in the healthcare industry looking to adopt AI to improve patient care. We will provide examples of an API-driven Data Hub solution that enables life-science & healthcare organizations to leverage the advancements of AI to help improve diagnoses, find better treatments, and discover new drugs while protecting confidential patient information.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing, Meet the Experts: Bio-IT World Editorial Team, and Book Signing with Joseph Kvedar, MD, Author, The Internet of Healthy Things℠ (Book will be available for purchase onsite)

Cityview 2

4:00 Building a Low-Cost Sample Tracking System with G Suite & Jira Cloud

Bruce Kozuma, MSCS, PMP, CPIM, CSM, CPO, Principal Systems Analyst, Broad Information Technology Services, The Broad Institute of MIT and Harvard

Current off-the-shelf technology allows for development of a low-cost, serverless sample tracking solution using commonly used components (G Suite and Jira Cloud). Combined with Agile principles (e.g., minimum viable product, short cycle and iterative delivery) has resulted in a solution that is helping reduce cost of research at the Broad Institute of MIT and Harvard.

4:30 CASE STUDY: Successful Cloud Migration Strategies & Techniques

Lance Smith, Associate Director, IT, Celgene

Steve Sivak, HPC Engineer, IT, Celgene

The public cloud is going mainstream and most pharma/biotech have started moving select workloads. New workloads are easy to create in the cloud; the challenge has been what to do with the legacy software in our industry. We discuss various strategies to migrate HPC, database and other biotech applications, and some technologies to assist your organization during this phase. We’ll also cover our lessons learned and overall recommendations.

5:00 Rapidly Aggregate and Share Large Data Sets Using IBM Aspera

Joseph Hansen, Aspera Technical Sales & Delivery Expert, IBM 

The life sciences industry is generating massive data stores as a result of modern techniques. This rich data is stored in a variety of methods, lacking standardization. Capturing the aggregate value of the data is critical to new discoveries, yet such analysis required data consolidation. Learn how IBM Aspera accelerates big data analysis in life sciences, using any deployment environment.

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 18

7:30 am Registration Open and Morning Coffee


9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

Cityview 2

10:30 Chairperson’s Remarks

John Mattison, MD, CMIO, Kaiser Permanente

10:40 Fog Computing Concept Model

Larry Feldman, PhD, Senior Security Engineer, G2, Inc.

Cisco estimated that by 2020 there will be over 50 billion interconnected, heterogeneous, smart devices that will require an adequate infrastructure to support them. Fog computing is the next new technology that offers a distributed and federated compute model, which provides low-latency computational resources, elastic capabilities, data analytics, and management.

11:10 FEATURED PRESENTATION: Cloud Migration: Experiences from a Large Complex Integrated Delivery Network: Options, Trade-Offs, and Early Experiences from a Value-Based Delivery System

John Mattison, MD, CMIO, Kaiser Permanente

11:40  R&D Data, Cloud, and the Pursuit of Happiness

Brady Haggstrom, Product Manager, IDBS

12:10 pm Session Break

Nutanix 12:20 Luncheon Presentation I: Accelerating Product Pipeline with Nutanix

Dana Racine, Senior Systems Engineer, Sales Technology, Nutanix

Through machine learning, simplicity, and rapid scaling, Nutanix provides a solution to accelerate time to market for products in the pharma, medical device, and biotechnical industries. As a “cloud in a box” solution, Nutanix is more than just software or hardware: a complete platform for increasing operational efficiencies and reducing complexity is available to any organization in need of a competitive advantage.  Session content will include a Nutanix overview, technical specifications, and a case study.

12:50 Luncheon Presentation II: Talk Title to be Announced

Chris Bellmare Vice President of Northeast and Canadian Operations, Arista NetworksArista

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing


1:55 Chairperson’s Remarks

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

2:00 PANEL DISCUSSION: High Performance Consultancies


Chris Dwan, Senior Technologist and Independent Life Sciences Consultant


Tanya Cashorali, CEO, Founder, TCB Analytics

Aaron Gardner, Director of Technology, BioTeam, Inc.

Eleanor Howe, PhD, Founder and CEO, Diamond Age Data Science

An organization must learn and understand the value of why, when and how to use a consultancy. Highly trained and skilled professional experts gather to discuss their role in leading and managing projects for organizations to help them achieve goals. They will discuss a variety of themes including the best kinds of projects to hire a consultancy for, the timeline of when an organization should hire a consultant vs. full time staff, and big challenges on the horizon. The session will feature short podium presentations, followed by a moderated Q&A panel with attendees. The topic of hiring a consulting company came up in the data science plenary keynote at Bio-IT 2018. We want to spend time at Bio-IT 2019 exploring this topic in finer detail.

3:20 KEYNOTE PRESENTATION: Trends from the Trenches 2019

Chris Dagdigian, Co-Founder and Senior Director, Infrastructure, BioTeam, Inc.

The “Trends from the Trenches” in its original “state of the state address” returns to Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data intensive science.

We are looking for your feedback on the key questions, problems, issues you are facing. We will incorporate the feedback into Chris’ talk. Complete this short survey anonymously:

4:00 Conference Adjourns

Exhibit Hall and Keynote Pass

Data Platforms and Storage Infrastructure