Track 2- Data Computing

2017 Archived Content

OVERVIEW | DOWNLOAD BROCHURE | SPEAKERS | WORKSHOPS

Track 2: Data Computing

There is an increased demand in computing power from life science researchers and scientists in genomics tackling big data issues. Track 2 explores techniques and new methods of data transfer and workflows. Themes covered include but aren’t limited to workforce and equipment mobility, HPC across the enterprise vs. HPC as a service, reconfigurable hardware for HPC and Hadoop.

Tuesday, May 23

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

(W1) Data Management for Biologics: Registration and Beyond

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

(W11) Scientific Project Management

* Separate registration required.

2:00 – 6:00 Main Conference Registration Open

4:00 PLENARY KEYNOTE SESSION

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, May 24

7:00 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION

9:50 Coffee Break in the Exhibit Hall with Poster Viewing

NOVEL COMPUTING TECHNIQUES FOR BIG DATA PROCESSING

10:50 Chairperson’s Remarks

Thomas Manganaro, District Manager, Purestorage

11:00 Making Sense of Big Data

Chen Shen, Researcher, Computer Science, The George Washington University

Big data is a collection of large datasets that cannot be processed using traditional computing techniques. We suggest a different Big Data System that allows generating novel scientific hypotheses. The system is based on a novel method of clustering using Golay code transformations that are applied to diverse Boolean queries.

11:30 Using Ticket Tracking Software and Agile Techniques to Accelerate Science and Operations at The Broad

Bruce Kozuma, Principle System Analyst, Broad Information Technology Services, The Broad Institute of MIT and Harvard

Sadiya Akasha, Project Manager, Broad Information Technology Services, The Broad Institute of MIT and Harvard

The Broad has been using JIRA (often used for IT ticket tracking) and Agile techniques to speed science and streamline operations, e.g., handle the scale of genomic sample processing, visualize sample processing workflows, and track operational tasks such as data center moves. During this talk, we'll discuss the specific techniques and features used in those examples, such as JIRA Board of Boards.

12:00 pm Distributed Analytics at World Wide Scale and the Outbreak of Infectious Diseases

Patricia Florissi, Ph.D., Global CTO, Dell EMC
In collaboration with Ben Gurion University, Dell EMC has prototyped a next-generation outbreak surveillance system based on metagenomic sequencing. Dell EMC uses bi-clustering to uncover common patterns of virulence factors among subsets of micro-biomes, unleashing the potential to not only identify early onset of outbreaks but to also uncover new combinations of virulence factors that may characterize new diseases.

12:15 How a Hosted Supercomputer Can Put a Little Spark in Your Variant Analysis

Ted Slater, Global Head, Healthcare & Life Sciences, Cray, Inc.

NGS analysis workflows are big, and growing bigger every day as sequencing machines proliferate and analytics pipelines mature. Conventional compute architectures can be “good enough,” but what are the hidden costs of waiting for results? Using a hosted supercomputer together with tools like Apache Spark™ will help you leverage data more effectively, recover those costs, and gain insight faster.

12:30 Session Break

12:40 Luncheon Presentation: Security & Regulatory Compliance in Google's Cloud

Ben Lavallee, Customer Engineering Specialist, Team Manager, Google Cloud

Come learn how Google Cloud's innovations in Security, Networking and Machine Learning are helping Bio-IT Organizations move to the cloud and access new capabilities.

1:40 Session Break

DATA COMPUTING, AUTOMATION, INTEGRATION & WORKFLOWS: INNOVATIVE MODELS, CAPABILITIES, AND APPLICATIONS

1:50 Chairperson’s Remarks

Brian Bissett, Senior Member, IEEE, Institute of Electrical and Electronic Engineers

1:55 Automation Proves Hard Work Pays Off Later More Than Laziness Pays Off Now

Brian Bissett, Senior Member, IEEE, Institute of Electrical and Electronic Engineers

Life is a zero sum game. Two places cannot utilize the same resources at the same time. The most valuable resource of any organization is its intellectual capital. When the best and brightest are occupied doing mundane tasks, an organization is unlikely to reach its full potential. The focal point of this presentation is how to leverage the right tools, techniques, and methods to automate the mundane tasks in an organization, so that its thinkers can be engaged in something that is worth thinking about.

2:25 Data Platform for a Community Ecosystem of Contextual Biological Information

Austin Huang, Ph.D., Associate Director, Biomedical Data Science Lead, Enterprise Science & Technology Operations, Pfizer

A key impediment to reproducible computational workflows in research is the treatment of dependencies on persistent data. We have implemented a data platform that can achieve the benefits of a more principled handling of data persistence with minimal analyst overhead. This is achieved by automating schema inference, metadata curation, versioning, and RESTful service production to reduce the engineering and administrative capacity typically required for data repositories.

2:55 Scalable Data Computing for Healthcare & Life Sciences Industry

Prashant Avashia, Senior Architect, Storage & Software Defined Solutions, IBM Systems, IBM

Fully leveraging all data, structured and unstructured, can enable more patient-centric care to help organizations achieve better outcomes. IBM software defined solutions provide a proven roadmap to transform fragmented data silos into a common, universally accessible platform for genomics, imaging and analytics. Clinicians can then apply IBM Watson technology to obtain cognitive insights from this platform for evidence-based decisions.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

DATA COMPUTING, AUTOMATION, INTEGRATION & WORKFLOWS: INNOVATIVE MODELS, CAPABILITIES, AND APPLICATIONS (CONT.)

4:00 Integrating Biomedical Devices into Information Systems

Jim McGinnis, Assistant Professor, Engineering Technology, University of Memphis

This presentation shares case studies that help formulate a framework for the integration of biomedical devices into information systems with an emphasis on information quality. The framework outlines the importance of data quality in projects to define a clear pathway integrating disparate systems together for better patient support.

4:30 AIDEAS 3.0: The New Generation Cheminformatics Platform

Rishi Gupta, Senior Research Scientist, Platform Informatics and Knowledge Management, AbbVie, Inc.

Vincent Le Guilloux, Platform Informatics and Knowledge Management, AbbVie, Inc.

AIDEAS is an integrated Cheminformatics solution that has brought together several scientific applications and methods under a single umbrella. This presentation will discuss the technology on which AIDEAS 3.0 is built and the scientific application built within AIDEAS 3.0. Examples will be presented to showcase the ability of AIDEAS to provide advanced scientific workflows within one of the best visual analytics frameworks that allows users to share information and define multiple analysis templates in a facile way.

5:00 Engineering for Insight, Building the Optimal Mix RWE Capabilities

Michael Madden, Client Solutions Director, Dell EMC
At Dell EMC, we assist healthcare and life science organizations to find insights by connecting streams of data making it more viable for the entire development chain: from discovery to commercialization. We will highlight hybrid cloud capabilities that can lead to new insight, foresight and predictive findings on diseases, products, and patient populations.

5:30 – 6:30 15th Anniversary Celebration in the Exhibit Hall with Poster Viewing and Best of Show Awards

Thursday, May 25

7:00 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM

8:05 Benjamin Franklin Awards and Laureate Presentation

8:35 Best Practices Awards Program

8:50 Plenary Keynote

9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

DATA COMMONS: DIGITAL ECOSYSTEMS FOR USING AND SHARING BIOMEDICAL DATA AT SCALE

10:30 Chairperson's Remarks

10:40 PANEL DISCUSSION: Data Commons

Matthew Trunnell, Vice President and CIO, Fred Hutchinson Cancer Center

Vivien Bonazzi, Ph.D., Senior Advisor for Data Science Technologies, National Institutes of Health (NIH)
Michael Fitzsimons, Ph.D., User Services Manager, NCI Genomic Data Commons, University of Chicago, Center for Data Intensive Science

Brian D. O’Connor, Ph.D., Technical Director, Genomics Institute - Computational Genomics Platform, University of California, Santa Cruz

The Data Commons is an open science platform that allows producers and consumers of scientific data to connect, interact, exchange, create value and generate new discoveries, creating the basis for a digital ecosystem that can support scientific discovery in the era of biomedical big data. The computing platform is flexible and scalable, allowing scientists and researchers to transparently find and use services and tools they need, access large public data sets, and connect with other resources associated with scholarly research. During this 60-minute panel discussion, short focused podium talks will be presented on how this system has been adapting to the different evolving needs of research communities and technology innovations. A Q&A moderated discussion follows.

11:40 Storage Trends for Healthcare IT
Vik Nagjee, Vice President & CTO, Healthcare and Life Sciences, Pure Storage
Healthcare organizations are requiring more of IT than ever before. IT is now a strategic element in the organization, tasked with facilitating innovation under new business models and driving improvements to the patient experience. In this session, Vik Nagjee will discuss current trends in healthcare IT, how modern IT systems impact outcomes, and storage trends for healthcare and biological research.

12:10 Session Break

12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

FEATURED SESSION: BIOTEAM MICRO-SYMPOSIUM: 2017 BIO-IT TRENDS

1:55 Chairperson’s Remarks

Chris Dwan, Senior Technologist and Independent Consultant

2:00 BioTeam Micro-Symposium: 2017 Bio-IT Trends

Chris Dwan, Senior Technologist and Independent Consultant (Moderator)

Ari E. Berman, Ph.D., Vice President and General Manager of Consulting Services, BioTeam, Inc.

Chris Dagdigian, Founding Partner & Director, Technology, BioTeam, Inc.

Aaron Gardner, Senior Scientific Consultant, BioTeam, Inc.

Adam Kraut, Director of Infrastructure and Cloud Architecture, BioTeam, Inc.

Asya Shklyar, Senior Scientific Consultant, Infrastructure, BioTeam, Inc.

Since 2010, the “Trends in the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk was to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation tried to recap the prior year by discussing what has changed (or not) around infrastructure, storage, computing, and networks. This presentation has helped scientists, leadership, and IT professionals understand the basic topics involved in supporting data intensive science. For 2017, the “Trends in the Trenches” presentation will evolve and expand from 60-minutes to 120-minutes and feature more content, speakers, and interactive discussion. Short focused podium talks on current trends related to computing, storage/data transfer, networks, and cloud will be presented. A Q&A moderated discussion follows. Come prepared with your questions and commentary for this informative and lively session.

4:00 Conference Adjourns

Data Platforms and Storage Infrastructure