2018 Archive | Track 7- Next-Gen Sequencing Informatics

2018 Archived Content

OVERVIEW | DOWNLOAD BROCHURE | SPEAKERS | WORKSHOPS

Track 7: Next-Gen Sequencing Informatics

Tremendous advancements have been made to broaden NGS applications from research to the clinic, especially as genomics becomes more integrated with precision medicine initiatives. In spite of this, enormous challenges for NGS still exist including real time sequencing, data storage, processing, scaling, quality control management, security and compliance in the cloud, and interpretation. Track 7 presents case studies on these challenges.

Tuesday, May 15

7:00 am Workshop Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

W4. Introduction to Scalable and Reproducible RNA-Seq Data Processing, Analysis, and Result Reporting Using AWS, R, knitr, and LaTex

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

W11. Data Science Driving Better Informed Decisions

* Separate registration required.

2:00 – 6:30 Main Conference Registration Open (Commonwealth Hall)

4:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

Wednesday, May 16

7:00 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)

9:45 Coffee Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

LARGE-SCALE RNA-SEQ AND GENE EXPRESSION VARIABILITY
Cambridge

10:50 Chairperson’s Remarks

Johannes Goll, Director, Bioinformatics, The Emmes Corporation

11:00 KEYNOTE PRESENTATION: RNA-Seq X: Look Back and Look Ahead

Shanrong Zhao, PhD, Director, Computational Biology and Bioinformatics, Pfizer, Inc.

Since Dr. Mortazavi published his groundbreaking research entitled “Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq” in Nature Methods in 2008, RNA-seq has evolved rapidly and revolutionized biological research, drug development and clinical diagnostics. 2018 is the 10-year anniversary of RNA-seq, and it’s the right time to look back and look forward.

11:30 LCA: A Robust and Scalable Algorithm to Reveal Subtle Diversity in Large-Scale Single-Cell RNA Sequencing Data

Xiang Chen, PhD, Assistant Member, Department of Computational Biology, St. Jude Children’s Research Hospital

We developed Latent Cellular Analysis (LCA), a machine learning based single-cell RNA sequencing (scRNA-seq) analytical pipeline that combines similarity measurement by latent cellular states and a graph based clustering algorithm featuring dual-space model search for both the optimal number of subpopulations and the informative cellular states distinguishing them. LCA has proved to be robust, accurate and powerful by comparison to multiple state-of-the-art computational methods on large-scale real and simulated scRNA-seq data.

12:00 pm Sponsored Presentation (Opportunity Available)

12:15 RSEQREP: An Open-Source Cloud-Enabled Framework for Reproducible RNA-Seq Data Processing, Analysis & Result Reporting

Johannes Goll, Director, Bioinformatics, The Emmes Corporation

RSEQREP (RNA-Seq Reports) is a new open-source cloud-enabled framework that allows researchers to execute start-to-end RNA-Seq analysis to characterize transcriptomics changes in human cells following treatment. It outputs dynamically generated reports using R and LaTeX. We provide results for a published RNA-Seq study to characterize transcriptomics changes following influenza vaccination.

12:30 Session Break

12:40 Luncheon Presentation I: Querying of 100k Genomes Using Google Cloud

Hákon Gudbjartsson, PhD, Chief Informatics Officer, WuXi NextCODE

Hákon Gudbjartsson will demonstrate the power of the GOR database in real time. GORdb is used to organize, mine and share massive genome datasets, providing a global architecture for the largest precision medicine efforts worldwide. It’s designed to enable fast, computationally-efficient use of sequence data, and allows for the query and application of data in the context of reference sets.

1:10 Luncheon Presentation II (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:40 Session Break

OPTIMIZING GENE BASES WITH CODON USAGE
Cambridge

1:50 Chairperson’s Remarks

Leonard Lipovich, PhD, Associate Professor with Tenure, Center for Molecular Medicine and Genetics, Wayne State University

1:55 Analysis of Codon Optimized Therapeutic Proteins Using Ribosome Profiling

Chava Kimchi-Sarfaty, PhD, Research Chemist, Principal Investigator, OTAT Acting Deputy Associate Director for Research, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, FDA | CBER | OTAT

Codon optimization is a genetic engineering technique used to improve the yield of recombinant therapeutic proteins. Despite being used ubiquitously to increase protein expression, codon optimization requires widespread substitution of synonymous codons across the native expression sequence. This degree of genetic manipulation can carry consequences, including altered conformation of the recombinant product. These unforeseen modifications can have impacts on protein function and health outcomes, and are of high regulatory importance. To study these techniques, we have used ribosome profiling, a technique used to characterize the translation pattern of the ribosome across the mRNA transcript. In this technique, actively translating ribosomes are cross‐linked to mRNA and is followed by nuclease digestion of mRNA not protected by a ribosome, generating short mRNA fragments (called “ribosome footprints”). These fragments are sequenced and aligned to generate a differential coverage map across portions of the transcript. This technique provides insight into the relative translation efficiency in a given area of the transcript. We have analyzed the ribosome profiling data for relationships to codon usage. By identifying regions of differential ribosome profiling patterns between wild type and codon optimized transcripts, we aim to create a method of selecting regions to leave unmodified, allowing recombinant proteins to benefit from increased expression while maintaining the integrity and safety of the protein product. Codon optimization as a technique relies heavily on accurate codon usage statistics of the organism in question, to identify rare codons to be replaced with common codons for an increase in translation efficiency. However, previous databases containing this information were either outdated or limited in scope. To address this gap in knowledge, we constructed a new database containing codon usage tables for all the species in GenBank and RefSeq. We designed a program in Python to download, parse, and organize all the sequence data available in these two repositories, and in Javascript designed an accessible web portal available to the public to query the new database. The new HIVE‐CUTs database contains substantially more organisms and coding sequence data and is a dramatic improvement upon prior databases. This tool will aid in the effective implementation of codon optimization techniques and other areas of recombinant protein design.

2:25 Multidimensional Global Proteogenomics Identifies Persistent Ribosomal In-Frame Mis-Translation of Stop Codons as Amino Acids in Multiple Open Reading Frames from a Human Breast Cancer Long Non-Coding RNA

Leonard Lipovich, PhD, Associate Professor with Tenure, Center for Molecular Medicine and Genetics, Wayne State University

Two-thirds of the ~60,000 human genes (www.gencodegenes.org) do not encode known proteins, and aside from long non-coding RNA (lncRNA) genes with recently characterized functions, the possibility that these poorly understood genes’ transcripts serve as de-facto unconventional messenger RNAs has not been formally excluded. Our group was the first to use direct evidence from protein mass spectrometry, preceding efforts that employed indirect evidence from ribosome profiling, to demonstrate that specific lncRNAs are recurrently and nonrandomly translated in human cells (Bánfai et al 2012, Genome Research 22:1646-1657). In our current study, we integrated RNAseq, ribosome profiling, and mass spectrometry to globally assess lncRNA translation in human estrogen receptor alpha positive MCF7 breast cancer cells. We identified 27 peptides, mapping to multiple sense-strand open reading frames (ORFs) of the lncRNA gene MMP24-AS1, united by a novel and highly unconventional property: the existence of these peptides can only be explained by stop-to-nonstop in-frame replacements of specific UAG and UGA (but not UAA) stop codons by amino acids. This result, validated by the absence of any genomic mutations, polymorphisms, and RNA editing events in genomic and cDNA targeted resequencing, represents an unprecedented apparent gene-specific violation of the Genetic Code in human breast cancer cells, and hints at a new mechanism enhancing the combinatorial complexity of the cancer proteome.
[Note 1: This work has been funded in its entirety by the NIH Director’s New Innovator Award 1DP2-CA196375 to LL.]
[Note 2: This project encompasses collaborations. A full listing of co-authors will be shown during the talk.]

2:55 CO-PRESENTATION: Workflow Optimization for NGS Discovery - How to Drive BIX Insights

Jack DiGiovanna, PhD, General Manager, NGS Applications and Services, Seven Bridges

Isaac M. Neuhaus, PhD, Director, Computational Genomics, Bristol Myers Squibb

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

NGS DATA ANALYSIS, INTEGRATION, INTERPRETATION, AND VISUALIZATION
Cambridge

4:00 Variant Query Tool: Drag & Drop for a Scalable, Server-Less, Web UI to Querying Annotated Variants

William Van Etten, Senior Scientific Consultant, BioTeam

It’s a challenge to build an environment that provides real-time querying of reads and annotated variants for genomics research, requiring significant human and computational resources. Whether tens or thousands of genomes, the barrier to entry can be high for the biologists/geneticist, who might not also be computer scientist. BioTeam has developed a simple tool that leverages several AWS services (S3, Athena, Lambda, Cognito, IAM, CloudWatch) to enable a biologists/geneticist to drag & drop VCF and BAM files onto an S3 bucket, then point their web browser at this bucket, to provide a scalable, server-less, web UI to querying the reads and annotated variants within these files. We aim to demonstrate, explain, and promote what we’ve learned from this proof of concept software development in the hope that others might benefit from our experience.

4:30 Building a GXP Validated Platform for NGS Analysis Pipelines

Anthony Rowe, PhD, Business Technology Leader, R&D IT, Janssen R&D LLC

An NGS applications approach the clinic the bioinformatics pipelines used to analyze the data have to be validated to demonstrate their correctness. This talk will present Janssen approach to deploying validated NGS applications with specific focus in microbiome metagnomics.

5:00 LIMS or ELN, Which Do You Need?

Kevin Cramer, CEO, Sapio Sciences

Both Biotech and Pharma need Laboratory Information Management (LIMS) and Electronic Lab Notebook (ELN) capabilities. Sapio has eliminated the barriers between these two product areas by leveraging its more than decade of unique experience offering both LIMS and ELN solutions and combining the key features of each solution into one, best of breed, product: Exemplar ELN Pro.

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

7:00 – 10:00 Bio-IT World After Hours @Lawn on D
**Conference Registration Required. Please bring your conference badge, wristband, and photo ID for entry.

Thursday, May 17

7:30 am Registration Open(Commonwealth Hall) and Morning Coffee (Foyer)

8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM (Amphitheater & Harborview 2)

9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced (Commonwealth Hall)

APPLICATION OF NGS TO ONCOLOGY, IMMUNOLOGY, DIAGNOSTICS, AND THERAPEUTIC DEVELOPMENT
Cambridge

10:30 Chairperson’s Remarks

Bruce Press, Executive Vice President, Business Development & Strategy, Seven Bridges Genomics

10:40 Instantiating a Single Point of Truth for Genomic Reference Data

David Herzig, Scientist, Research Informatics, Roche Pharmaceuticals

This talk will exemplify how expression and mutation data were made actionable by consolidating a scattered landscape of genomic reference data into a real SPoT.

11:10 A Network-Based Approach to Understanding Drug Toxicity

Yue Webster, PhD, Principal Research Scientist, Informatics Capabilities, Research IT, Eli Lilly and Company

Despite investment in toxicogenomics, nonclinical safety studies are still used to predict clinical liabilities for new drug candidates. Network-based approaches for genomic analysis help overcome challenges with whole-genome transcriptional profiling using limited numbers of treatments for phenotypes of interest. Herein, we apply co-expression network analysis to safety assessment using rat liver gene expression data to define 415 modules, exhibiting unique transcriptional control, organized in a visual representation of the transcriptome. Compared to gene-level analysis alone, the network approach identifies significantly more phenotype-gene associations, including established and novel biomarkers of liver injury.

11:40 Advancing Clinical NGS Test Development Using Thousands of Pediatric Cancer Samples on St. Jude Cloud
Michael Rusch, Director of Bioinformatics Research Development, St. Jude Children's Research Hospital

12:10 pm Enjoy Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

DATA MINING FOR DISEASE CLASSIFICATION
Cityview 1

1:55 Chairperson’s Remarks

John Methot, Director, Health Informatics Architecture, Dana-Farber Cancer Institute

2:00 Disease Classification in the Era of Data-Intensive Medicine

Kanix Wang, PhD, Research Professional, Booth School of Business, Institute for Genomics & Systems Biology, University of Chicago

We used insurance claims for over one-third of the U.S. population to create a subset of 128,989 families (481,657 unique individuals). Using these data, we estimated the heritability and familial environmental patterns of 149 diseases. We then computed the environmental and genetic disease classifications for a set of 29 complex diseases after inferring their pairwise genetic and environmental correlations.

2:30 Enviro-Geno-Pheno State Approach and State-Based Biomarkers for Differentiation, Prognosis, Subtypes, and Staging

Lei Xu, PhD, Director, Centre for Cognitive Machines and Computational Health; Zhiyuan Chair Professor, Department of Computer Science and Engineering, Shanghai Jiao Tong University

In the joint space of geno-measures, pheno-measures, and enviro-measures, one point represents a bio-system behavior and a subset of points that locate adjacently and share a common system status represents a ‘state’. The system is characterized by such states learned from samples. This enviro-geno-pheno state is considered a biomarker, indicating ‘health/normal’ versus ‘risk/abnormal’ together with its associated enviro-geno-pheno condition.

3:00 PANEL DISCUSSION: Can We Improve Breast Cancer Patient Outcomes through Artificial Intelligence?

Maya Said, ScD, President & CEO, Outcomes4me, Inc. (Moderator)

Panelists:
Regina Barzilay, PhD, MacArthur Fellow and Delta Electronics Professor, Massachusetts Institute of Technology (MIT) Department of Electrical Engineering and Computer Science; Member, Computer Science and Artificial Intelligence Laboratory, MIT

Kevin Hughes, MD, Co-Director, Avon Breast Evaluation Program, Massachusetts General Hospital; Associate Professor of Surgery, Harvard Medical School; Medical Director, Bermuda Cancer Genetics Risk Assessment Clinic

Osama Rahma, MD, Assistant Professor of Medicine, Center For Immuno-Oncology, Dana-Farber Cancer Institute

Newly diagnosed cancer patients attempting to understand their treatment options face the overwhelming task of filtering an information deluge, much of which is irrelevant, outdated and occasionally inaccurate. Additionally, matching their diagnosis to best-in-class treatments or potential clinical trials, while simultaneously learning to navigate an extremely complex healthcare system is daunting, even for the most highly trained physicians. We will explore various platforms aimed at improving patient outcomes by leveraging technology to help educate, track, and connect patients with personalized resources while simultaneously working to improve the care continuum and the development of new treatments. We will explore the nexus of healthcare networks and their IT systems, clinical decision-making and delivery, R&D, and patients, for whom we all create our innovation solutions. Attendees will be interested to understand how various groups are working to increase value across the entire system by bringing laboratory, clinical and pharmaceutical science, real-world evidence and patient-reported data together with technology and artificial intelligence to solve health challenges. These approaches offer the opportunity to generate deeper insights into how therapies perform in the real world and harness that understanding to improve efficiency, effectiveness, value, and ultimately, patient care.

4:00 Conference Adjourns

Conference Tracks

T1: Data Platforms & Storage Infrastructure