Track 2 - April 5– 7, 2016
Advances in Computing Application for Big Data
Tackling big data issues that researchers and scientists in genomics and the life sciences are focused on requires an increased demand in networking and computing power. Track 2 explores techniques and new methods of data storage, transfer and workflows. Themes covered include but aren’t limited to application portability, reproducibility, local vs. cloud computing, extreme computing, moving computing vs. moving data, meta computing, and high-performance computing. There will be a joint session with this track and one of the other tracks to discuss common issues including converged infrastructure and networking.
Tuesday, April 5
7:00 am Workshop Registration and
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
Creating a Best of Breed Informatics Environment for Your
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
Data Science Driving Better Informed Decisions
* Separate registration required
2:00 – 6:00 Main Conference Registration
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster
Wednesday, April 6
7:00 am Registration Open and
9:00 Benjamin Franklin Awards and Laureate Presentation
9:30 Best Practices Awards Program
9:45 Coffee Break in the Exhibit Hall with Poster Viewing
10:50 Chairperson’s Opening Remarks
Claire Giordano, Senior Director, Emerging Storage Markets, Quantum
11:00 Advancing Translational R&D - Clinical Image
David Witt, Imaging Biomarkers Informatics Lead, Bristol-Myers
Medical imaging plays an ever increasing role in drug development. As innovative imaging technology continues to evolve, online access to medical images facilitates the rapid delivery of quantitative information to foster decision making at all stages of translational drug development. Incorporating a clinical trial Medical Image Management System (MIMS) into the drug development platform requires the re-examination of existing workflows to maximize the qualitative and quantitative benefits realized with MIMS. Consideration must be given to all aspects of the imaging strategy in order to create a definitive paradigm shift. This talk will present improved workflows and underlying technology challenges and opportunities with advancing translational R&D using high-quality clinical image management. A summary of lessons learned in multiple areas such as image transfer, metadata and collaboration management using clinical and pre-clinical information and systems will be presented.
Delivery of NGS Analysis Pipelines in Software Containers
Satu Nahkuri, Ph.D., Data Scientist, Pharma Research and Early
Development Informatics, Roche Innovation Center Basel
Software containers such as Docker and CoreOS Rocket have in the past few years gained popularity among cloud providers for setting up PaaS / IaaS (Platform as a Service, Infrastructure as a Service). We have adopted containers for a different purpose, i.e., delivering our preferred next-generation sequencing (NGS) analysis tools to our collaborators' computing environments. Our solution allows us to secure consistent computing workflows and reproducible results with minimal reconfiguration burden. We anticipate that in the future, container and unikernel technologies will facilitate a novel computing paradigm, where NGS analysis and visualization pipelines are mobile, while NGS data remains stationary.
12:00 pm Pushing the Limits of Discovery with Internet2 - Cloud to Supercomputing in Life Sciences
Dan Taylor, Director, Business Development, Network Services, Internet2
Advances in life sciences rely on both world class collaboration and an ecosystem of secure cloud services and supercomputing seamlessly connected by a high-performance network. Learn how organizations are leveraging commercial clouds such as AWS, private big data scientific research clouds, supercomputing resources such as NCSA and San Diego Supercomputing, and dynamic combinations of these tactics to advance life science research with Internet2.
12:15 Managing NGS Data: Smaller is Better!
Rafael Feitelberg, CEO, Geneformics
The tremendous growth of NGS data is a blessing and a curse, leading to increasing pain in management and requiring escalating investments in infrastructure. We will review how organizations are reducing data volumes by up to 10X - on-premises and in the cloud - without any change to their workflow.
12:30 Session Break
12:40 Luncheon Presentation I: Genome Analysis Pipelines, Big Data Style
Allen Day, Principal Data Scientist, MapR Technologies
Bioinformatics workflow requirements are well-matched to BigData tools' capabilities. However Spark, for example, is not commonly used because many bioinformatics tool authors assume a legacy computing environment will be used. Barriers are quickly coming down. We'll examine a few conventional bioinformatics analyses and show how they can be modernized to save time, money, and make new types of analysis possible.
1:10 Luncheon Presentation II: Cover Your Bases: 7 Ways Genomics Workflows Can Benefit From Multi-Tier Storage
Claire Giordano, Senior Director, Emerging Storage Markets, Quantum
Dramatic declines in the cost and run times for genome sequencing are enabling bioinformaticians to do more, faster. But these advances come with a challenge—how to manage all of this valuable data? Quantum’s Claire Giordano explores how multi-tier storage (including object storage) can help genomics researchers accelerate time to discovery, improve access for distributed teams, and cost-effectively keep sequenced genome data for decades.
1:40 Session Break
1:50 Chairperson’s Remarks
Chris Dwan, Acting Director, IT, Broad Institute
1:55 To the Cloud(s): Broad Institute’s Journey Outside of Our
Chris Dwan, Acting Director, IT, Broad Institute
2:25 Handling Cloud Project
Gurpreet Kanwar, Senior Project Manager, Information Management, NAV Canada
2:55 How Bluebee & Others Solve the File Exchange Problem for Bioinformatics
Michelle Munson, President, CEO & Co-Founder, Aspera, an
Hans Cobben, CEO, Bluebee
As new research techniques create terabytes of NGS data, the need to quickly, easily, and securely ingest and exchange large genome data files with the cloud’s scale-up capacity becomes critical. Learn how Bluebee and other bioinformatics companies overcome this challenge by integrating or using Aspera FASP technologies and solutions to securely move large files at high-speed to and from multiple cloud and on-premise storage systems, regardless of where the data is located.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing
4:00 The Matchmaker
Exchange: A Platform for Rare Disease Gene Discovery
Anthony Philippakis, M.D., Ph.D., Chief Data Officer, Broad
4:30 Using Cloud Platforms for Consumer-Driven Integration of
Research and Operations
Jonas Almeida, Ph.D., Professor & CTO, Department of
Biomedical Informatics, Stony Brook University (SUNY)
5:00 Managing the Mayhem: Overcoming the Challenges of Long-Term Data Retention
David Hiatt, Marketing, Vertical Marketing, Health and Life Sciences, HGST
Data volumes continue to grow and retention periods lengthen—whether by need or by mandate—so researchers and IT leaders face increasingly difficult decisions about how to meet long-term retention requirements yet keep the storage budget in check. Learn practical methods for managing the mayhem and making sure that more research dollars go to research rather than to infrastructure.
5:30 – 6:30 Best of Show Awards Reception in the Exhibit Hall
with Poster Viewing
Thursday, April 7
7:00 am Registration and Morning Coffee
10:00 Coffee Break in the Exhibit Hall and Poster Competition
10:30 Chairperson’s Opening Remarks
Sanjay Joshi, CTO, EMC
10:40 FEATURED PRESENTATION: HPC
Trends in the Trenches 2016
Chris Dagdigian, Founding Partner & Director, Technology,
In one of the most popular presentations of the Expo, Chris
delivers a candid assessment of the best, the worthwhile, and the most
overhyped information technologies (IT) for life sciences. He’ll cover what has
changed (or not) in the past year around infrastructure, storage, computing,
and networks. This presentation will help you understand IT to build and
support data intensive science.
11:40 Realize a Fiftyfold Increase in Sequencing by Combining Performance Scale-Out Storage with the Latest Next-Gen Sequencers
David Sallak, Vice President, Products & Solutions, Panasas
In this talk, you will learn how to easily harness and manage data by deploying scale-out storage that accelerates workflows and brings plug-and-play simplicity to data management. Panasas customer Garvan Institute of Medical Research was able to increase their sequencing capacity to 50 genomes per day on average, without adding staff- a fiftyfold improvement, after combining the Illumina HiSeq X Ten sequencer with Panasas ActiveStor high-performance storage. With Panasas, Garvan was able to streamline their workflow by keeping the sequencing data in the central repository throughout the analysis, resulting in faster delivery of results to researchers around the world.
11:55 How Next Generation Scale-Out Storage Is
Enabling the Next Frontier of Life Sciences Breakthroughs
Joel Groen, Product Manager, Qumulo
With major technology advances in genomic IT, data is being
created at a faster rate than ever before – creating massive storage and data
management challenges for Life Sciences and bioinformatics organizations that
are tasked with managing hundreds of millions to trillions of files. Enter
next-generation scale-out storage – which provides real-time answers about data
footprints at incredible scale, abstracts away the underlying infrastructure,
and achieves breakthrough performance using intelligent software and commodity
hardware – all while balancing performance, capacity and cost.
12:10 pm Session Break
12:20 Luncheon Presentation I: Object Storage:
Enabling Genomic Sequencing at Petabyte Scale
Joe Arnold, President and
Chief Product Officer, Leadership, SwiftStack
The audience will learn the following from our presentation:
1) How incorporating multi-petabyte storage-as-a-service into research
environments can be cost-efficient, scalable and manageable; 2) How to
implement an open source object storage system that keeps up with data volume
while improving data management and organization by using arbitrary tags and
metadata; and 3) How chargebacks can determine storage user behavior.
12:50 Luncheon Presentation II:
Petabytes of Data to Find What You Actually Want
Kiran Bhageshpur, CEO, Igneous Systems
Just a decade ago, large
data sets were still measured in the TB’s and PB data sets were rare. In
today’s world, even a modest laboratory can generate petabytes of data that
needs to be ingested, processed, curated and stored for decades. Yet, our ways
of interacting with these large data sets remain mired in tools and techniques
build for MB data sets. Surely there has to be a better way?
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster
1:55 Chairperson’s Remarks
Eric A. Stahlberg, Ph.D., [Contractor], High-Performance
Computing Strategy, Data Science and Information Technology Program, Leidos
Biomedical Research, Inc., Frederick National Laboratory for Cancer Research
Discussion: Actionable Big Data Analytics
Moderator: Eric A. Stahlberg, Ph.D., Leidos Biomedical
Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)
Kiran Bhageshpur, CEO, Igneous Systems, Inc.
David King, CEO, Exaptive
Timothy Danford, Ph.D., Field Engineer, Tamr, Inc.
Leading life sciences experts will discuss trends and best
practice case studies of turning big data into smart data that can lead to real
time assistance in organization decision making, disease prevention, prognosis,
diagnostics, and therapeutics. Learn how and where these organizations have
assembled and analyzed information from different data ‘silos’ and deployed
solutions to make decisions. We’ll discuss technology tools used to move data
and information from retrospective reporting to real-time predictive analytics.
Walk away hearing practical steps, solutions, and capabilities that you can
implement within your own organization.
3:30 Future Convergence
Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc.,
Frederick National Laboratory for Cancer Research (FNLCR)
4:00 Conference Adjourns