Track 5 - April 21 – 23, 2015
Next-Gen Sequencing Informatics
Advances in Large-Scale Data Analysis and Interpretation
Tremendous advancements have been made to broaden NGS applications from the research to the clinic. In spite of this, enormous challenges for NGS still exist including data storage, processing, scaling, quality control management, and interpretation. Track 5 presents case studies on these challenges. Themes to be covered include database systems to manage and analyze NGS data, analytic tools and workflow solutions, cloud computing and collaborative technologies, and NGS variants & gene mapping and expression.
Download Brochure | Workshops
Tuesday, April 21
7:00 am Workshop Registration and Morning Coffee
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
Genome Assembly and Annotation
Intelligent Methods Optimization of Algorithms for NGS
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
Customizing Your Digital Research Environment with Genome Browsers
Large Scale NGS Analysis Using Globus Genomics
* Separate registration required
2:00 – 6:30 Main Conference Registration
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing
Wednesday, April 22
7:00 am Registration Open and Morning Coffee
9:00 Benjamin Franklin Awards and Laureate Presentation
9:30 Best Practices Awards Program
9:45 Coffee Break in the Exhibit Hall with Poster Viewing
10:50 Chairperson’s Opening Remarks
Chairperson to be Announced, Bina Technologies, Inc.
11:00 Global Next Generation Sequencing Informatics Markets: Inflated Expectations in an Emerging Market
Greg Caressi, Senior Vice President, Healthcare and Life Sciences, Frost & Sullivan
This presentation evaluates the global next-generation sequencing (NGS) informatics markets from 2012 to 2018. Learn key market drivers and restraints, a detailed analysis of the changing competitive landscape, revenue forecasts, and important trends and predictions that affect market growth. Key highlights for many of the leading NGS informatics services providers, commercial primary and secondary data analysis tools vendors, commercial biological interpretation and clinical reporting tools vendors, and NGS LIMS vendors will be presented.
11:30 Large-Scale NGS Analysis Using Globus Genomics: Challenges and User Success Stories
Ravi Madduri, Fellow, Computation Institute, University of Chicago and Argonne National Lab
Dinanath Sulakhe, Solutions Architect, Computation Institute, University of Chicago and Argonne National Lab
In this talk, we will present some of the challenges in scaling up NGS analysis on public cloud infrastructure and present user success stories where we have overcome them.
12:00 pm Turn-Key Variant Analysis for the Biologist: Using the Maverix Analytic Platform
Dan Kearns, Director, Software Development, Maverix Biomics, Inc.
Studies leveraging WGS, Exome, and Targeted sequencing data are commonly limited by the tools, infrastructure, and trained bioinformaticians necessary to process, interpret and manage the data. The Maverix Analytic Platform addresses these challenges through a unique environment designed for biologists. This cloud-based platform leverages best-in-class tools and methods, and provides an integrated environment to enable visualization and interpretation of results.
12:15 Developing and Provisioning Robust Automated Analytical Pipelines for Whole Genome-Based Public Health Microbiological Typing
Anthony Underwood, Ph.D., Lead, Bioinformatics, Infectious Disease Informatics, Microbiology Services Division, Public Health England
Whole genome sequencing has great potential for microbial characterization in public health. Open source bioinformatics tools can generate necessary information, however converting these tools for usage in routine public health is challenging. They must be automated, auditable, timely, and robust, as well as record errors and log outputs. Dr Underwood will discuss the infrastructure, software architecture and algorithms used for this at Public Health England.
12:30 Session Break
12:40 Luncheon Presentation I: Sample Aggregation and Analytics in the Post-$1,000 Genome Era
Paul Flook, Ph.D., Senior Director, Enterprise Informatics, Illumina Inc.
With the launch of the Illumina HiSeq X Ten system, the long-promised $1,000 genome became a reality. But as is often the case in science and engineering, the realization of one goal reveals new challenges to surmount. The economics of sequencing now make the sequencing of entire populations feasible, but aggregating, tracking, and analyzing whole human genome data cannot be done serially when it is produced in parallel. This presentation will discuss parallel sample processing approaches that enable multi-sample genome interpretation and analysis of large cohorts by employing cloud-scale computing.
1:10 Luncheon Presentation II
Speaker to be Announced
1:40 Session Break
1:50 Chairperson’s Remarks
1:55 The Cloud Reigns: Enabling Scalable Analysis and Storage for High-Throughput Next-Gen Sequencing
John Penn, Associate Manager, NGS Data Analysis, Regeneron Genome Center
2:25 Data Intensive Academic Grid (DIAG): A Free Computational Cloud Infrastructure Designed for Bioinformatics Analysis
Anup Mahurkar, Executive Director, Software Engineering and IT, Institute for Genome Sciences, University of Maryland School of Medicine
2:55 Co-Presentation: The Challenges of Scaling Platforms for Translational Science: New Approaches and Case Studies
Houtan Aghili, Ph.D., Senior Technical Staff Member, Industry Solutions - Healthcare and Life Sciences; IBM Software Group
Janis Landry-Lane, Genomics Solutions, Software Defined Infrastructure, IBM World-Wide
As researchers build platforms for translational science, High Performance Data Centric Computing will be a key investment that must be considered in order to provide an integrated and scalable solution which fulfills the needs of multiple departments. In this session, we will cover: processing the NGS pipeline in order to bring omics data into a scalable information management platform, the role of natural language processing for integrating unstructured information, the integration of on-premise and cloud solutions, and effective data and content management at scale. IBM will present both a vision and potential solutions that have enabled our customers to build an effective architecture.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing
4:00 Deep Sequencing Based Analysis of Ig repertoire in Humanized Mice
Stefan Klostermann, Ph.D., Expert Scientist, Bioinformatics / Data Science, Roche Innovation Center Penzeberg
On our quest for human biotherapeutical antibodies we developed a novel methodology: Instead of replacing the mouse genomic immune loci by the human orthologs we reconstituted the humoral immune response in immunodeficient mice transplanted with human hematopoietic stem cells. An in-depth characterization of the reconstituted immune system by data analysis of deep sequencing Ig repertoire validated the humanized mouse be immunological equivalent to human donors.
4:20 Development of Novel Algorithms for Assembly of RNA-seq Reads Into Transcriptomes
Guojun Li, Ph.D., Professor, Mathematics, Shandong University
We developed a more effective and efficient assembler to assemble RNA-seq reads into full-length transcripts encoded in a genome based on a new perception that the full-length transcripts would be better recovered from combinations of spliced junctions which can be detected by aligning RNA-seq reads against a reference genome using splice-awared aligner than from overlapped reads. As currently done with de novo assembly, we have modeled the de novo assembly problem as to find a min-cost minimum path cover over a junction graph defined on those spliced junctions of a gene. The preliminary implementation shows that it performs even better than reference-based approaches in most cases since it will not be adversely affected by errors introduced from the reference genome. Motivated from the current investigation we further found that a more general assembling problem would be modeled as to find a min-cost minimum path cover over a so called interval graph, which can be exactly solved by solving a series of bins packing problems. This is a global optimization strategy as opposed to our current de novo assembler Bridger, a heuristic approach, has improved existing de novo assemblers due to the sequence depth information being effectively incorporated into the assembly procedure and a new concept of junction graph being introduced to be in place of overlap graph defined in some existing assemblers.
4:40 BLASTing with Chromatin Architecture: A Novel Method of Genomic Functional Element Identification and Annotation
Michael J. Buck, Ph.D., Associate Professor, Department of Biochemistry, SUNY at Buffalo; Director, Stem Cell Sequencing/Epigenomics Center, The State University of New York at Buffalo; Co-Director, Next-Generation Sequencing & Expression Analysis Core, The State University of New York at Buffalo
Identification of genomic functional elements, i.e. promoters, insulators and enhancers, is essential to understanding the complex regulatory processes involved in cellular differentiation, response to the environment, and disease development and progression. However, finding these locations within the genome can be a laborious and expensive undertaking requiring site specific assays. Even more difficult is identifying entirely new classes of genomic features. In order to facilitate identification and characterization of new classes of genomic features, we developed and implemented a chromatin Architecture Basic Local Alignment Search Tool (ArchBLAST). The ArchBLAST algorithm utilizes conserved chromatin architecture or DNA-binding protein signatures at known sites of interest and globally searches the genome for similar sites. ArchBLAST differs from other approaches in that it uses the amplitude and spatial arrangement of all types of sequencing data to score similarity. ArchBLAST is extremely flexible and can search with all chromatin-based assays such as ChIP, FAIRE, and DNase-Seq as well as non-chromatin assays such as RNA and CAGE-Seq. Importantly, ArchBLAST allows for identification of subtypes of known genomic features and can accurately predict previously uncharacterized locations. ArchBLAST uses an innovative weighted profile generated from only the most informative genome-wide datasets and then scores the entire genome. We have validated the accuracy of our approach with multiple genomic features in both yeast and humans. We show ArchBLAST is capable of predicting both gene expression and genomic feature directionality as well as identifying cell-type specific enhancers using chromatin architecture and/or DNA-binding protein signatures.
5:00 Sponsored Presentation (Opportunity Available)
5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing
6:30 Close of Day
Thursday, April 23
7:00 am Registration Open and Morning Coffee
10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced
10:30 Chairperson’s Remarks
10:40 Informatics Infrastructure for Secure Access, Visualization and Analysis of NGS data
Ted Kalbfleisch, Ph.D., Assistant Professor, Biochemistry and Molecular Biology, University of Louisville
The Variant Call Format file provides a list of variants detected, and genotypes measured in a next generation sequence dataset, along with summary statistics that allow a user to assess the confidence with which they should accept the call. We provide a novel mechanism by which the source NGS records from which the VCF file was derived may be accessed for additional scrutiny, or re-evaluation, either visually, or algorithmically. This new formalism provides support for users to drag and drop links for NGS datasets between autonomous applications, even to the command line for straightforward access to and inspection of subsets of NGS records that are relevant to questions posed by researchers or clinicians. The audience will learn that it is possible to securely access, and share NGS data for both visualization and analysis in distributed environments. We will describe an architecture that may be extended to other –omics technologies that may fundamentally change in the way researchers access, analyze, and publish high throughput data.
11:10 NGS Data Management at Lilly: Progress towards Standardization
Yuhao Lin, Associate Consultant, Informatics Capabilities, Eli Lilly
11:40 Sponsored Presentation (Opportunity Available)
12:10 pm Session Break
12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing
1:55 Chairperson’s Remarks
2:00 Technology and Data Analysis Methods for NGS Data
Yaoyu Wang, Ph.D., Associate Director, Center for Cancer Computational Biology, Dana Farber Cancer Institute
2:30 Talk Title to be Announced
Craig Pohl, Co-Director, Bioinformatics, The Genome Institute, Washington University
3:00 Reproducible NGS Research: Practical Approaches and Case Studies
Joseph D. Szustakowski, Ph.D., Senior Group Head, Novartis Institutes for BioMedical Research
3:30 An Open Source Precision Medicine Platform for Cloud Operating Systems
Alexander Wait Zaranek, Ph.D., Director of Informatics, Harvard Personal Genome Project; Chief Scientist, Curoverse, Inc.
The unique “big-data” requirements for precision medicine are best served by a common open-source platform developed collaboratively by and for the biomedical community. This platform can address the need to share the influx of human sequence data amongst various stakeholders (researchers, physicians, and the individuals themselves), stringent privacy and security guarantees that comply with government regulations, deep provenance for data reproducibility and analysis validation, and flexibility in efficiently compressing and searching of these data. We launched the Arvados project to meet community needs and are announcing its latest component, Lightning, an open-source, distributed query and translation engine. Our computational experiments suggest we can encode 100 individual complete genomes in a few gigabytes or less while still including all variation and all regions that are confidently called. A system offering efficient, real-time access to 1,000,000 genomes could require fewer than 10 racks of off-the-shelf hardware. Availability: http://arvados.org Software License: GNU AGPLv3; Data: Public Domain via CC0.
4:00 Conference Adjourns
Download Brochure | Workshops