Bio IT World Expo 2016  
Bio IT World Expo 2016

Track 5 - April 21 – 23, 2015

Next-Gen Sequencing Informatics 

Advances in Large-Scale Data Analysis and Interpretation

Tremendous advancements have been made to broaden NGS applications from the research to the clinic. In spite of this, enormous challenges for NGS still exist including data storage, processing, scaling, quality control management, and interpretation. Track 5 presents case studies on these challenges. Themes to be covered include database systems to manage and analyze NGS data, analytic tools and workflow solutions, cloud computing and collaborative technologies, and NGS variants & gene mapping and expression.

Final Agenda

Download Brochure | Workshops 

Tuesday, April 21

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

Genome Assembly and Annotation

Intelligent Methods Optimization of Algorithms for NGS

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

Customizing Your Digital Research Environment with Genome Browsers

Large Scale NGS Analysis Using Globus Genomics

* Separate registration required

2:00 – 6:30 Main Conference Registration


Click here for detailed information. 

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing


Wednesday, April 22

7:00 am Registration Open and Morning Coffee


Click here for detailed information. 

9:00 Benjamin Franklin Awards and Laureate Presentation

9:30 Best Practices Awards Program

Internet 2

9:45 Coffee Break in the Exhibit Hall with Poster Viewing


10:50 Chairperson’s Opening Remarks

Narges Bani Asadi, Founder and CEO, Bina Technologies, Inc., A Member of the Roche Group

11:00 Global Next Generation Sequencing Informatics Markets: Inflated Expectations in an Emerging Market

Greg Caressi, Senior Vice President, Healthcare and Life Sciences, Frost & Sullivan

This presentation evaluates the global next-generation sequencing (NGS) informatics markets from 2012 to 2018. Learn key market drivers and restraints, a detailed analysis of the changing competitive landscape, revenue forecasts, and important trends and predictions that affect market growth. Key highlights for many of the leading NGS informatics services providers, commercial primary and secondary data analysis tools vendors, commercial biological interpretation and clinical reporting tools vendors, and NGS LIMS vendors will be presented.


11:30 Large-Scale NGS Analysis Using Globus Genomics: Challenges and User Success Stories

Ravi Madduri, Fellow, Computation Institute, University of Chicago and Argonne National Lab

Dinanath Sulakhe, Solutions Architect, Computation Institute, University of Chicago and Argonne National Lab

In this talk, we will present some of the challenges in scaling up NGS analysis on public cloud infrastructure and present user success stories where we have overcome them.

Maverix Biomics12:00 pm Turn-Key Variant Analysis for the Biologist: Using the Maverix Analytic Platform

Dan Kearns, Director, Software Development, Maverix Biomics, Inc.

Studies leveraging WGS, Exome, and Targeted sequencing data are commonly limited by the tools, infrastructure, and trained bioinformaticians necessary to process, interpret and manage the data. The Maverix Analytic Platform addresses these challenges through a unique environment designed for biologists. This cloud-based platform leverages best-in-class tools and methods, and provides an integrated environment to enable visualization and interpretation of results.

DDN Storage12:15 Developing and Provisioning Robust Automated Analytical Pipelines for Whole Genome-Based Public Health Microbiological Typing

Anthony Underwood, Ph.D., Lead, Bioinformatics, Infectious Disease Informatics, Microbiology Services Division, Public Health England

Whole genome sequencing has great potential for microbial characterization in public health. Open source bioinformatics tools can generate necessary information, however converting these tools for usage in routine public health is challenging. They must be automated, auditable, timely, and robust, as well as record errors and log outputs. Dr Underwood will discuss the infrastructure, software architecture and algorithms used for this at Public Health England.

12:30 Session Break

Illumnia logo12:40 Luncheon Presentation I: Sample Aggregation and Analytics in the Post-$1,000 Genome Era

John Shon, Vice President, Bioinformatics & Data Sciences, Illumina, Inc.

With the launch of the Illumina HiSeq X Ten system, the long-promised $1,000 genome became a reality. But as is often the case in science and engineering, the realization of one goal reveals new challenges to surmount. The economics of sequencing now make the sequencing of entire populations feasible, but aggregating, tracking, and analyzing whole human genome data cannot be done serially when it is produced in parallel. This presentation will discuss parallel sample processing approaches that enable multi-sample genome interpretation and analysis of large cohorts by employing cloud-scale computing.

1:10 Luncheon Presentation II (Sponsorship Opportunity Available)

1:40 Session Break

1:50 Chairperson’s Remarks
Carlos P. Sosa, Ph.D., HPC Chemistry and Life Sciences Technical Lead, Biomedical Informatics and Computational Biology, Cray Inc, University of Minnesota Rochester 

1:55 The Cloud Reigns: Enabling Scalable Analysis and Storage for High-Throughput Next-Gen Sequencing

John Penn, Associate Manager, NGS Data Analysis, Regeneron Genome Center

2:25 Data Intensive Academic Grid (DIAG): A Free Computational Cloud Infrastructure Designed for Bioinformatics Analysis

Anup Mahurkar, Executive Director, Software Engineering and IT, Institute for Genome Sciences, University of Maryland School of Medicine

IBM2:55 Co-Presentation: The Challenges of Scaling Platforms for Translational Science: New Approaches and Case Studies

Houtan Aghili, Ph.D., Senior Technical Staff Member, Industry Solutions - Healthcare and Life Sciences; IBM Software Group

Janis Landry-Lane, Genomics Solutions, Software Defined Infrastructure, IBM World-Wide

As researchers build platforms for translational science, High Performance Data Centric Computing will be a key investment that must be considered in order to provide an integrated and scalable solution which fulfills the needs of multiple departments. In this session, we will cover: processing the NGS pipeline in order to bring omics data into a scalable information management platform, the role of natural language processing for integrating unstructured information, the integration of on-premise and cloud solutions, and effective data and content management at scale. IBM will present both a vision and potential solutions that have enabled our customers to build an effective architecture.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing


4:00 Deep Sequencing Based Analysis of Ig repertoire in Humanized Mice

Stefan Klostermann, Ph.D., Expert Scientist, Bioinformatics / Data Science, Roche Innovation Center Penzeberg

On our quest for human biotherapeutical antibodies we developed a novel methodology: Instead of replacing the mouse genomic immune loci by the human orthologs we reconstituted the humoral immune response in immunodeficient mice transplanted with human hematopoietic stem cells. An in-depth characterization of the reconstituted immune system by data analysis of deep sequencing Ig repertoire validated the humanized mouse be immunological equivalent to human donors.

4:30 BLASTing with Chromatin Architecture: A Novel Method of Genomic Functional Element Identification and Annotation

Michael Buck, Ph.D., Associate Professor, State University of New York at Buffalo; Dept. of Biochemistry; Dept. of Biomedical Informatics; Co-Director, UB Genomics and Bioinformatics Facility; Director, WNYSTEM Stem Cell Sequencing/Epigenomics Facility; NY State Center of Excellence in Bioinformatics and Life Sciences
Identification of genomic functional elements, i.e. promoters, insulators and enhancers, is essential to understanding the complex regulatory processes involved in cellular differentiation, response to the environment, and disease development and progression. However, finding these locations within the genome can be a laborious and expensive undertaking requiring site specific assays. Even more difficult is identifying entirely new classes of genomic features. In order to facilitate identification and characterization of new classes of genomic features, we developed and implemented a chromatin Architecture Basic Local Alignment Search Tool (ArchBLAST). The ArchBLAST algorithm utilizes conserved chromatin architecture or DNA-binding protein signatures at known sites of interest and globally searches the genome for similar sites. ArchBLAST differs from other approaches in that it uses the amplitude and spatial arrangement of all types of sequencing data to score similarity. ArchBLAST is extremely flexible and can search with all chromatin-based assays such as ChIP, FAIRE, and DNase-Seq as well as non-chromatin assays such as RNA and CAGE-Seq. Importantly, ArchBLAST allows for identification of subtypes of known genomic features and can accurately predict previously uncharacterized locations. ArchBLAST uses an innovative weighted profile generated from only the most informative genome-wide datasets and then scores the entire genome. We have validated the accuracy of our approach with multiple genomic features in both yeast and humans. We show ArchBLAST is capable of predicting both gene expression and genomic feature directionality as well as identifying cell-type specific enhancers using chromatin architecture and/or DNA-binding protein signatures. 

5:00 High Performance Computing Technology and Methodology Applied to Next-Generation Sequencing Workflows

Carlos P. Sosa, Ph.D., HPC Chemistry and Life Sciences Technical Lead, Biomedical Informatics and Computational Biology, Cray Inc, University of Minnesota Rochester

High Performance Computing (HPC) Technology and Methodology (profiling and optimizing) are enabling scientists in many disciplines to achieve progressively more demanding and valuable results. In this talk we will illustrate how the same technology and methodology can be used to dramatically accelerate next-generation sequencing (NGS) workflows.

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

6:30 Close of Day

Thursday, April 23

7:00 am Registration Open and Morning Coffee


Click here for detailed information. 

10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced


10:30 Chairperson’s Remarks
Alexander Wait Zaranek, Ph.D., Director of Informatics, Harvard Personal Genome Project; Chief Scientist, Curoverse, Inc.  

10:40 Informatics Infrastructure for Secure Access, Visualization and Analysis of NGS data

Ted Kalbfleisch, Ph.D., Assistant Professor, Biochemistry and Molecular Biology, University of Louisville

The Variant Call Format file provides a list of variants detected, and genotypes measured in a next generation sequence dataset, along with summary statistics that allow a user to assess the confidence with which they should accept the call. We provide a novel mechanism by which the source NGS records from which the VCF file was derived may be accessed for additional scrutiny, or re-evaluation, either visually, or algorithmically. This new formalism provides support for users to drag and drop links for NGS datasets between autonomous applications, even to the command line for straightforward access to and inspection of subsets of NGS records that are relevant to questions posed by researchers or clinicians. The audience will learn that it is possible to securely access, and share NGS data for both visualization and analysis in distributed environments. We will describe an architecture that may be extended to other –omics technologies that may fundamentally change in the way researchers access, analyze, and publish high throughput data.

11:10 NGS Data Management at Lilly: Progress and Challenges

Yuhao Lin, Consultant-Informatics Capabilities, Eli Lilly

General Atomics11:40 Simplifying NGS Data Management with Metadata Centric Intelligent Storage

Robert Murphy, Big Data Program Manager, General Atomics

The rapid advance of NGS speed and cost reduction has opened the floodgates to staggering amounts of data. Managing overwhelming genomics data growth is critical to continued discovery. Adding workflow-specific NGS metadata is the key. With it, NGS constituents can find and access valuable data, share it world-wide for collaborative research, and make it available to support reproducibility mandates, while ensuring provenance, curation and future data availability. This presentation will describe an easily deployable metadata-oriented data management system that can be used to simplify all aspects of NGS data management.

12:10 pm Session Break

12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

1:55 Chairperson’s Remarks
Alexander Wait Zaranek, Ph.D., Director of Informatics, Harvard Personal Genome Project; Chief Scientist, Curoverse, Inc.  

2:00 WebMeV- A Cloud based Platform for Genomic Analysis

Yaoyu Wang, Ph.D., Associate Director, Center for Cancer Computational Biology, Dana Farber Cancer Institute

2:30 Talk Title to be Announced

Craig Pohl, Co-Director, Bioinformatics, The Genome Institute, Washington University

3:00 Reproducible NGS Research: Practical Approaches and Case Studies

Joseph Szustakowski, Ph.D., Group Director, Translational Bioinformatics, Bristol-Myers Squibb

3:30 An Open Source Precision Medicine Platform for Cloud Operating Systems
Alexander Wait Zaranek, Ph.D., Director of Informatics, Harvard Personal Genome Project; Chief Scientist, Curoverse, Inc. 

The unique “big-data” requirements for precision medicine are best served by a common open-source platform developed collaboratively by and for the biomedical community. This platform can address the need to share the influx of human sequence data amongst various stakeholders (researchers, physicians, and the individuals themselves), stringent privacy and security guarantees that comply with government regulations, deep provenance for data reproducibility and analysis validation, and flexibility in efficiently compressing and searching of these data. We launched the Arvados project to meet community needs and are announcing its latest component, Lightning, an open-source, distributed query and translation engine. Our computational experiments suggest we can encode 100 individual complete genomes in a few gigabytes or less while still including all variation and all regions that are confidently called. A system offering efficient, real-time access to 1,000,000 genomes could require fewer than 10 racks of off-the-shelf hardware. Availability: Software License: GNU AGPLv3; Data: Public Domain via CC0.

4:00 Conference Adjourns 

Download Brochure | Workshops 

Reg Early


View 2015 Brochure
View 2015 Brochure
View Videos & Photos 
Platinum Sponsors

Cycle Computing logo

DDN Storage  


Illumnia logo  

Intel Logo  


Official Media Partner

Conference CD

CD iconOrder the 2015 event proceedings - now available on CD

Complimentary Downloads

View white papers, listen to podcasts, and more!

  • Making the World's Knowledge Computable
  • Bioinformatics in the Cloud
  • The Application of Text Analytics to Drug Safety Surveillance

Related Event

 Medical Informatics World Related