Bio IT World Expo 2016  
Bio IT World Expo 2016
2014 Archived Content

Next-Gen Sequencing Informatics 

Track 5 is dedicated to advances in analysis and interpretation of next-gen sequencing data. Topics to be covered include analysis of sequence variants related to cancer research from NGS data, instruments facilitating a cloud approach for NGS, analysis tools and workflows, network biology/network medicine, and NGS standardization and performance testing.

Final Agenda

Download Brochure | Pre-Conference Workshops 


7:00 am Workshop Registration and Morning Coffee

8:00 - 11:30 Recommended Morning Pre-Conference Workshops*

Analyzing NGS Data in Galaxy

12:30 - 4:00 pm Recommended Afternoon Pre-Conference Workshops*

Running a Local Galaxy Instance

*Separate Registration Required. Click here for detailed information.

2:00 - 7:00 pm Main Conference Registration

4:00 Event Chairperson's Opening Remarks

Cindy Crowninshield, RD, LDN, Conference Director, Cambridge Healthtech Institute


Click here for detailed information. 

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing




7:00 am Registration Open and Morning Coffee

8:00 Chairperson's Opening Remarks

Phillips Kuhl, Co-Founder and President, Cambridge Healthtech Institute


Click here for detailed information. 

9:00 Benjamin Franklin Award & Laureate Presentation

9:30 Best Practices Awards Program

9:45 Coffee Break in the Exhibit Hall with Poster Viewing

NGS Bioinformatics Marketplace: Emerging Trends and Predictions 

10:50 Chairperson's Remarks

Narges Baniasadi, Ph.D., Founder & CEO, Bina Technologies, Inc.

11:00 Global Next-Generation Sequencing Informatics Markets: Inflated Expectations in an Emerging Market

Greg Caressi, Senior Vice President, Healthcare and Life Sciences, Frost & Sullivan

This presentation evaluates the global next generation sequencing (NGS) informatics markets from 2012 to 2018. Learn key market drivers and restraints, a detailed analysis of the changing competitive landscape, revenue forecasts, and the important trends and predictions that affect market growth. Key highlights for many of the leading NGS informatics services providers, commercial primary and secondary data analysis tools vendors, commercial biological interpretation and clinical reporting tools vendors, and NGS LIMS vendors will be presented.

Organizational Approaches to NGS Informatics 

11:30 High-Performance Databases to Manage and Analyze NGS Data

Joseph Szustakowski, Ph.D., Head, Bioinformatics, Biomarker Development, Novartis Institutes for Biomedical Research

The size, scale, and complexity of NGS data sets call for new data management and analysis strategies. High-performance database systems combine the advantages of both established and cutting edge technologies. We are using high performance database systems to manage and analyze NGS, clinical, pathway, and phenotypic data with great success. We will describe our approach and concrete success stories that demonstrate its efficiency and effectiveness.

12:00 pm Taming Big Science Data Growth with Converged Infrastructure 
Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc.
Many of the largest NGS sites have identified IO bottlenecks as their number one concern in growing their infrastructure to support current and projected data growth rates. In this talk Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc. will share real-world strategies and implementation details for building converged storage infrastructure to support the performance, scalability and collaborative requirements of today's NGS workflows.

12:15 Next Generation Sequencing: Workflow Overview from a High-Performance Computing Point of View 

Carlos P. Sosa, Ph.D., Applications Engineer, HPC Lead, Cray, Inc.

Next Generation Sequencing (NGS) allows for the analysis of genetic material with unprecedented speed and efficiency. NGS increasingly shifts the burden from chemistry done in a laboratory to a string manipulation problem, well suited to High- Performance Computing. We explore the impact of the NGS workflow in the design of IT infrastructures. We also present Cray’s most recent solutions for NGS workflow.

12:40 Luncheon Presentation I: Erasing the Data Analysis Bottleneck with BaseSpace
Jordan Stockton, Ph.D., Marketing Director, Enterprise Informatics, Illumina, Inc.
Since the inception of next generation sequencing, great attention has been paid to challenges such as storage, alignment, and variant calling.  We believe that this narrow focus has distracted many biologists from higher-level scientific goals, and that simplifying this process will expedite the discovery process in the field of applied genomics.  In this talk we will show that applications in BaseSpace can empower a new class of researcher to go from sample to answer quickly, and can allow software developers to make their tools accessible to a vast and receptive audience.

1:10 Luncheon Presentation II: The Empowered Genome Community: First Insights from Shareable Joint Interpretation of Personal Genomes for Research

Nathan Pearson, Ph.D., Principal Genome Scientist, QIAGEN

Genome sequencing is becoming prevalent however understanding each genome requires comparing many genomes. We launched the Empowered Genome Community, consisting of people from programs such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome. Using Ingenuity Variant Analysis, members have identified proof of principle insights on a common complex disease (here,myopia) derived by open collaborative analysis of PGP genomes.

1:50 Chairperson's Remarks
Georges Heiter, Founder, Databiology, Ltd.  

1:55 High-Performance de novo Transcript Reconstruction Leveraging Distributed Memory and Massive Parallelization

Brian Haas, Senior Computational Biologist, Broad Institute

Exemplifying collaborative software development between industry and academia to tackle computational challenges in manipulating large volumes of next-gen sequence data, leveraging advances in algorithm development and compute hardware, we describe our efforts to optimize the performance of the Trinity RNA-Seq de novo assembly software. We explore a massively parallel computing architecture to tackle more efficient assembly of RNA-Seq data in the context of the Trinity assembly workflow.

2:25 'Titan' Supercomputer Helps Identify Pathogenic Bacteria in the Human Microbiome for Biosurveillance

Tae-Hyuk Ahn, Ph.D., Research Associate, Computer Science and Mathematics Division, Oak Ridge National Laboratory

Biosurveillance should perform rapid identification of the causative pathogen in a disease outbreak. This talk will present a new algorithm, SIGMA (, for metagenomic biosurveillance. SIGMA has the unique capability of identifying the correct strain of a pathogen in a complex metagenomic background from many closely related candidates in the reference genome database. Using a top open-science supercomputer, Titan, pathogenic bacteria strains can be identified in an hour from the 100 million human microbiome sequences.

 2:55 GENALICE MAP: The New Gateway for Whole Genome Sequencing to the Clinic
The ever-increasing output of Next Generation Sequencing (NGS) puts equally increasing demands on IT resources such as computing power, network bandwidth and data storage capacity. GENALICE MAP is an NGS short read alignment and variant calling solution using a novel alignment algorithm, efficient storage structure and a state-of-the-art high performance software design. It is faster, better and extremely cost-effective.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

NGS Data Computing and Management Structures and Workflows 

4:00 Multitier Infrastructure for NGS Data Computing and Management

Xiang Yao, Ph.D., Principal Scientist, Translational Informatics, the Janssen Pharmaceutical Companies of Johnson & Johnson

Pipelines having all NGS functions, from raw data handling to comprehensive analysis, have provided us conveniences. But this bundling approach is inefficient in addressing different needs of the users ranging from IT professionals to biologists, and is difficult for frequent software and hardware upgrades. This talk describes a multitier infrastructure that is modular and interconnected, to accommodate different computing, storage and access needs of the users, and to maximize return on investments.

4:30 HIVE Integrated Tools for NGS Bioinformatics: Detection Pipeline, Assembly Pipeline, Metagenomic Discovery Tools, Sequence Toolbox and More

Raja Mazumder, Ph.D., Associate Professor, Biochemistry and Molecular Biology, The George Washington University

HIVE (High-performance Integrated Virtual Environment:, is an implementation of a multicomponent cloud infrastructure where distributed storage and computational powerhouse are linked seamlessly to provide a secure Big Data analysis platform. Development of HIVE-based pipelines for NGS analytics has been the focus of collaborative efforts by FDA and GWU research groups. HIVE provides web access to deposit, annotate and compute on NGS data for detection of adventitious agents, disease causing mutations, metagenomic analysis and more.

5:00 Understanding NGS Variant Data Using Pathway Analysis

Nikolai Daraselia, Ph.D., Director of Research, Life Science Solutions, Elsevier

Next generation sequencing data is a tremendous resource for researchers studying the underlying genetic basis for diseases, and will be a key driver in the development of personalized medicine. However the large data sets generated by NGS also present significant analysis and interpretation challenges. To assist researchers in these tasks, Elsevier is developing a Variation Analysis module for its Pathway Studio product.  This new capability will make use of Pathway Studio’s industry-leading biological knowledgebase to provide researchers with the literature-based evidence they need to understand the significance of genetic variants. We will discuss sample workflows and examples of variation data analysis in the context of protein biological functions. 

5:30 - 6:30 Best of Show Awards Reception in the Exhibit Hall


7:00 am Registration Open

7:00 Breakfast Presentation (Sponsorship Opportunity Available) or Morning Coffee

8:00 Chairperson’s Opening Remarks

Kevin Davies, Ph.D., Vice President Business Development & Publisher C&EN, American Chemical Society; Founding Editor, Bio-IT World


Click here for detailed information. 

10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

Cloud Computing and Collaborative Technologies 

10:30 Chairperson's Opening Remarks
Jason Stowe, CEO and Founder, Cycle Computing 

10:35 Implementations of Cloud-Based Pipelines for Large-Scale DNA-Seq and RNA-Seq Data Analyses

Shanrong Zhao, Ph.D., Senior Scientist, Informatics, Johnson & Johnson

Due to reduced sequencing costs, more NGS data are produced by small research groups. Data storage and CPU resources required for large-scale whole-genome sequencing and RNA-Seq data analyses are too large for many individual laboratories to provide. To meet these challenges, we developed Rainbow and Stormbow: cloud-based software packages for large-scale DNA-Seq and RNA-Seq analyses.

10:55 Analyzing DNA-Seq Data Using DRAW: Lessons Learned from Using Amazon EC2 for Next-Generation Sequencing Studies

Li-San Wang, Ph.D., Assistant Professor, Pathology and Laboratory Medicine, University of Pennsylvania

DNA-Seq studies pose enormous challenges to many researchers who have limited access to dedicated IT support or high-performance computing. Cloud computing is a promising solution to address these needs. This talk covers our experience using the DNA Resequencing Analysis Workflow (DRAW) software to process >800 samples and our strategy to use Amazon EC2 effectively for DNA-Seq analysis.

11:15 Globus Genomics: An End-to-End NGS Analysis Service on the Cloud for Researchers and Core Labs

PodcastLogoRaviMadduriRavi K. Madduri, Fellow, Computation Institute, University of Chicago; Project Manager, Mathematics and Computer Science Division, Argonne National Laboratory

We describe the Globus Genomics platform. Globus Genomics provides an integrated platform for end-to-end data management using Globus Online and scalable analysis using the Galaxy framework and Amazon Web Services. We will walk through case studies of researchers and core labs at various universities that are leveraging the service to meet their rapidly growing genomics analysis needs.

11:35 Technology Advancements in High Density Compute and Storage that Power the Next Generation of Cloud Infrastructure

Brian Corn, Vice President, Marketing, Thinkmate

Thinkmate solutions accelerate discovery while reducing TCO, expanding scalability and enabling business continuity. Exciting new solutions from names like Intel, Supermicro, and Western Digital will be covered. This presentation is a “must see” for any attendees involved in hardware infrastructure design, testing, and procurement.

12:15 pm Luncheon Presentation I: Turn-Key RNA-Seq Analysis for the Biologist Using the Maverix Analytic Platform
Dan Kearns, Director, Software Development, Maverix Biomics, Inc. 
Studies leveraging RNA-seq data are commonly limited by the tools, infrastructure, and trained bioinformaticians necessary to process, interpret and manage the data. The Maverix Analytic Platform addresses these challenges through a unique environment designed for biologists. This cloud-based platform leverages best-in-class tools and provides an integrated UCSC-genome browser endpoint to enable visualization and interpretation of results. 

12:45 Luncheon Presentation II: Biotech Self-Service Agility in Public and Private Clouds

Dennis Faucher, Director, Presales, AdvizeX Technologies, a Rolta Company

Biotech requires fast time to market with high quality, high compliance and lowered cost.  The correct mix of private and public cloud enables: reduced costs to significantly lower Total Cost of Ownership (TCO), on-demand application and service provisioning in hours rather than weeks, increasing business agility, supporting business growth and accelerating time-to-market, and increased service levels and business continuity via an in-built disaster recovery capability.

1:15 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

NGS Variants & Gene Mapping and Expression 

1:55 Chairperson's Remarks
James Lyons-Weiler, Ph.D., Scientific Director, Bioinformatics Analysis Core Genomics and Proteomics Core Laboratory, University of Pittsburgh 

2:00 Characterization and Benchmarking of NGS Workflow Methods on Various Platform Architectures

Anthony Costa, Ph.D., Computational Scientist, Scientific Computing, Mount Sinai School of Medicine

The Next-Generation Sequencing (NGS) pipeline for short-read assembly and variant calling of DNA whole genome and exome data involves a complex array of methods with exposed parallelism from the core to the cluster level. We target numerically expensive pieces of this pipeline and investigate performance of many codes against a variety of CPU architectures and file systems. We further compare more recent methods implemented on non-traditional or heterogeneous architectures such as GPUs. The quality of the results from each method is also considered.

2:30 Avoiding Nonsense Results in your NGS Variant Studies

James Lyons-Weiler, Ph.D., Scientific Director, Bioinformatics Analysis Core Genomics and Proteomics Core Laboratory, University of Pittsburgh

Recent studies have demonstrated a lack of concordance among alternative variant calling algorithms. This talk presents an information-theory based paradigm that allows the objective performance comparison of pipeline components such as variant callers, study designs, read filters, and mapping algorithms. Our method solves the problem of concordance, increasing agreement among methods in one case from 32% to 86%. The evaluation methods I will present generalize to provide advanced quality control over sample prep protocols and will be useful for comparing sequencing platforms. Learn how to better prioritize NGS variants lead validation.

3:00 Bridger: A New Framework for de novo Transcriptome Assembly Using RNA-Seq Data

Guojun Li, Ph.D., Senior Research Scientist, Biochemistry and Molecular Biology, The University of Georgia

Full-length transcriptome assembly is highly challenging and not well-solved problem. The most important broad impact of the study will be that our new capability for transcriptome assembly will lead to a new level of understanding about the detailed mechanism of alternative splicing in eukaryotic genomes, hence facilitating new studies of transcriptional mechanism. This talk presents a new de novo assembler Bridger that takes advantage of techniques employed in the reference based assembler Cufflinks to overcome limitations of the existing de novo assemblers.

3:30 Analysis of Transgene Sequence and Integration Site in the CHO Genome by Next-Generation Sequencing and Use to Improve Expression

Nic Mermod, Ph.D., Professor and Director, Institute of Biotechnology, University of Lausanne

This talk presents information on how to validate protein-expressing cell lines taking an NGS approach. We have sequenced the genomes of several CHO cell clones producing therapeutic proteins and compared them to the parental genome sequence. This yielded information on the transgene sequence integrity as well as on the genomic integration locus and sequence. In turn, this gave information on the molecular mechanisms allowing the genomic integration of the vector and provided approaches to further optimize transgene integration and expression from transiently or stably engineered CHO cells.

4:00 Conference Adjourns


*IBM and the IBM logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. 

Download Brochure | Pre-Conference Workshops 

View 2016 Photos & Videos  

View 2016 Brochure
View 2016 Brochure
Platinum Sponsors


Cycle Computing logo small

DDN Storage  

Elsevier R&D Solutions


 IBM Logo Illumnia logo  

Intel Logo  

Precision for Medicine


 Seven Bridges Genomics

View All Sponsors

Official Media Partner

Official PR Partner

View All Media Partners

Conference CD

CD iconOrder the 2015 event proceedings - now available on CD

Complimentary Downloads

View white papers, listen to podcasts, and more!

  • Making the World's Knowledge Computable
  • Bioinformatics in the Cloud
  • The Application of Text Analytics to Drug Safety Surveillance

Related Event

 Medical Informatics World Related