Track 5- Next-Gen Sequencing Informatics

2019 Archived Content

Track 5: Next-Gen Sequencing Informatics

Tremendous advancements have been made to broaden NGS applications from research to the clinic. Especially as genomics becomes more integrated with precision medicine initiatives. In spite of this, enormous challenges for NGS still exist including data analysis pipelines and platforms; data integration, interpretation and visualization; application of sequencing to cancer, immunology, diagnostics, and therapeutic development and emerging sequencing technologies. The Next-Gen Sequencing Informatics track presents case studies on these challenges.

Final Agenda

Tuesday, April 16

7:00 am Workshop Registration Open and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

W6. DNA Sequencing 101

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

W12. Data Science Driving Better Informed Decisions

* Separate registration required.

2:00 – 6:30 Main Conference Registration Open

4:00 PLENARY KEYNOTE SESSION
Amphitheater

Co-Sponsored by

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, April 17

7:30 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION
Amphitheater

Co-Sponsored by

9:45 Coffee Break in the Exhibit Hall with Poster Viewing

10:50 Chairperson’s Remarks

David LaBrosse, Director, Genomics, Research, Life Sciences & Healthcare, NetApp

11:00 Long Read Sequencing

Justin Zook, PhD, Researcher, National Institute of Standards and Technology

11:20 NovoGraph: Loading 7 Human Genomes into Graphs

Evan Biederstedt, Computational Biologist, Memorial Sloan Kettering Cancer Center

11:40 Building a Usable Human Pangenome: A Human Pangenomics Hackathon Run by NCBI at UCSC

Ben Busby, PhD, Scientific Lead, NCBI Hackathons Group, National Center for Biotechnology Information (NCBI)

12:00 pm Co-Presentation: Faster Genomic Data

Michael Hultner, PhD, Senior Vice President, Strategy; General Manager, US Operations, PetaGene

David LaBrosse, Director, Genomics, Research, Life Sciences & Healthcare, NetApp

Genetic testing demand is driving up the volume of genomic data that must be processed, analyzed, and stored. Gigabyte-scale genome sample files and terabyte- to petabyte-scale cohort data sets must be moved from data generation to processing to analysis sites, historically a slow, arduous process. NetApp and PetaGene will describe compression and data transfer technologies that overcome I/O bottlenecks to accelerate the movement of genomic data and reduce the time to process and analyze it.

12:30 Session Break

12:40 Luncheon Presentation I: Deep Phenotypic and Genomic Analysis of UK Biobank Data on the WuXi NextCODE Platform

Saliha Yilmaz, PhD, Research Geneticist, WuXi NextCODE

The increasing size and complexity of genetic and phenotypic data to include hundreds of thousands of participants poses a significant challenge for data storage and analysis. We demonstrate use of the GOR database and query language underlying our platform to mine UK Biobank and other datasets for efficient phenotype selection, GWAS and PheWAS, and to archive and query the results.

1:10 NEW: Luncheon Co-Presentation II: Optimizing Drug Discovery and Development with Data-Driven Insights

Christian Frech, PhD, Associate Director, Scientific Operations, Seven Bridges

Serhat Tetikol, Research & Development Engineer, Seven Bridges

1:40 Session Break

1:50 Chairperson’s Remarks

Jeffrey Rosenfeld, PhD, Manager of the Biomedical Informatics Shared Resource and Assistant Professor of Pathology, Rutgers Cancer Institute of NJ

1:55 AbbVie’s Target and Genomics Compilation (ATGC): A Target Knowledge Platform

Rishi Gupta, PhD, Senior Research Scientist, Information Research, AbbVie, Inc.

Author: Anne-Sophie Barthelet, Scientific Developer, Discngine

ATGC is a web-based platform that allows AbbVie scientists to gather relevant information to make accurate decisions on target ID, target validation, biomarker selection and drug discovery. This platform provides in-depth information on several key pieces of information such as gene expression, RNA expression, protein expression, mouse knockout studies, etc. for each target. This talk focuses on key aspects of this application including application architecture, currently available tool sets and how various pieces of information are provided to the user.

2:25 Self Service Data Visualization and Exploration at Genentech Research

Kiran Mukhyala, Senior Software Engineer, Bioinformatics and Computational Biology, Genentech Research and Early Development

Genomic data requires specialized infrastructure to enable data exploration and analysis at scale. We built an integrated, modular, end-to-end gene expression analysis platform implementing data import, storage, processing, analysis and visualization. The multi-layered architecture of the platform supports general, high-level applications for self-service analytics, as well as infrastructure for prototyping, incubating and integrating scientist-driven innovations. The platform coexists with other in-house and commercial software to provide a wide range of genomic data analysis and visualization options for Research scientists.

2:55 Exploring and Visualizing Single-cell RNA Sequencing Data

Michael DeRan, PhD, Scientific Consultant, Diamond Age Data Science

Recent advances in single-cell RNA sequencing (scRNA-seq) technology have made this powerful method accessible to many researchers, but have not brought with them a clear, simple workflow for data analysis. As the number of scRNA-seq datasets has increased, so too has the number of analysis tools available; for those looking to perform their first scRNA-seq analysis the range of options can seem daunting. In working with our clients, I have had the opportunity to apply many different tools to scRNA-seq data from a variety of tissues and organisms. I have used this experience to select a set of tools that are flexible and suitable to many common scRNA-seq analysis tasks. In this talk I will introduce popular tools and methods for identifying cell populations, assessing differential expression and visualizing biological processes. I will discuss common pitfalls encountered in analyzing this data and make recommendations that anyone can use in their own analysis.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing, Meet the Experts: Bio-IT World Editorial Team, and Book Signing with Joseph Kvedar, MD, Author, The Internet of Healthy Things℠ (Book will be available for purchase onsite)

4:00 Comparison of Different Approaches for Clinical Cancer Sequencing

Jeffrey Rosenfeld, PhD, Manager of the Biomedical Informatics Shared Resource and Assistant Professor of Pathology, Rutgers Cancer Institute of NJ

The sequencing of tumors is important for guiding the treatment of cancer patients. While it is agreed that there is a need to perform sequencing of the tumor, there are a wide variety of approaches ranging from paired whole genome tumor-normal sequencing to tumor-only small panel sequencing with many intermediate possibilities. Each of the approaches has a different cost and associated benefit. I will present a comparison of different methods and their efficacy for guiding cancer treatment.

4:30 Integrated NGS Analysis to Accelerate Disease Understanding for Drug Discovery

Helen Li, Director- Research IT - Biologics & Informatics, Eli Lilly and Company

5:00 Identiﬁcation of Cancer Biomarker Genes

Maryam Nazarieh, PhD, Postdoctoral Researcher, Center for Bioinformatics, Universität des Saarlandes, Saarbrücken, Germany

Identiﬁcation of biomarker genes plays a crucial role in disease detection and treatment. Computational approaches enhance the insights derived from experiments and reduce the eﬀorts of biologists and experimentalists to identify biomarker genes which play key roles in complex diseases. This is essentially achieved through prioritizing a set of genes with certain attributes (1). Here, I propose a set of transcription factors that make the largest strongly connected component of the pluripotency network in embryonic stem cells as the global regulators that control diﬀerentiation process determining cell fate. This component can be controlled by a set of master regulatory genes. The regulatory mechanisms underlying stem cells inspired us to formulate the problem where a set of master regulatory genes in regulatory networks is identiﬁed with two combinatorial optimization problems namely as minimum dominating set and minimum connected dominating set in weakly and strongly connected components. The developed methods were applied to regulatory cancer networks to identify disease-associated genes and anti-cancer drug targets in breast cancer and hepatocellular carcinoma. As not all the nodes in the solutions are critical, a prioritization method was developed named TopControl to rank a set of candidate genes which relate to a certain disease based on systematic analysis of the genes that are diﬀerentially expressed in tumor and normal conditions. To this purpose, the NGS data were utilized taken from The Cancer Genome Atlas for matched tumor and normal samples of liver hepatocellular carcinoma (LIHC) and breast invasive carcinoma (BRCA) datasets. Moreover, the topological features were demonstrated in regulatory networks surrounding diﬀerentially expressed genes that were highly consistent in terms of using the output of several analysis tools. We present several web servers and software packages that are publicly available at no cost. The Cytoscape plugin of minimum connected dominating set identiﬁes a set of key regulatory genes in a user provided regulatory network based on a heuristic approach. The ILP formulations of minimum dominating set and minimum connected dominating set return the optimal solutions for the aforementioned problems. Our source code is publicly available. The web servers TFmiR and TFmiR2 construct disease-, tissue-, process-speciﬁc networks for the sets of deregulated genes and miRNAs provided by a user. They highlight topological hotspots and oﬀer detection of three- and four-node FFL motifs as a separate web service for both organisms mouse and human. 1) Maryam Nazarieh, Understanding regulatory mechanisms underlying stem cells helps to identify cancer biomarkers. Ph.D. thesis, Saarland University, Saarbrücken, Germany (2018).

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 18

7:30 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM
Amphitheater

9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

10:30 Chairperson’s Remarks

Konrad Karczewski, PhD, Computational Biologist, Broad Institute

10:40 Leveraging Human Genetic Electronic Medical Record-Linked Biobank Data to Guide Drug Discovery

Ron Do, PhD, Assistant Professor, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai

High failure rates of drug development in clinical trials are due in large part to inefficacy of drug therapeutics, and unforeseen adverse side effects. Genetic associations from genome-wide association studies have shown potential in guiding drug target prioritization. Electronic medical record (EMR)-linked biobank data have recently emerged as a source to conduct genome-wide association scans on a broad spectrum of medical and clinical phenotypes. My talk will evaluate the utility of such data in the context of drug research and development. Specifically, I will present results on utilizing genetic association data from a large EMR-linked biobank, for the purposes of informing efficacy and side effect prediction of drug therapeutics in clinical trials. I expect attendees to learn about the following: 1) genome-wide association studies; 2) EMR-linked biobanks; 3) how this genetic data can be used to guide drug target prioritization.

11:10 VCPA - A Cloud-Based SNP/Indel Variant Calling Pipeline and Data Management Tool Used for Analysis of WGS/WES for the Alzheimer’s Disease Sequencing Project

Yuk Yee Leung, PhD, Research Assistant Professor, Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania

The Alzheimer's Disease Sequencing Project (ADSP), an integral component of the National Alzheimer’s Project Act towards a cure of Alzheimer’s Disease (AD), will eventually analyze whole-genome sequencing (WGS) and whole-exome sequencing (WES) data from > 20,000 late-onset AD patients and cognitively normal elderly to find new genetic variants associated with disease risk. To ensure all sequencing data are processed consistently and efficiently according to best practices, a common workflow called “Variant Calling Pipeline and Data Management Tool” (VCPA) was developed by the Genome Center for Alzheimer's Disease (GCAD) in collaboration with ADSP. VCPA is capable to process any kind of germline DNA sequencing data and available for general use. VCPA 1) is optimized for large-scale production of WGS and WES data, 2) includes a tracking database with web frontend for users to track production process and review quality metrics; 3) is implemented using the Workflow Description Language (WDL) for better deployment and maintenance, 4) is designed for the latest human reference genome build (GRCh38/hg38, version GRCh38DH) and follows best practices for WGS analysis with input from TOPMed (Trans-Omics for Precision Medicine) and CCDG (Centers for Common Disease Genomics).

11:40 Variation Across 141,456 Individuals Reveals the Spectrum of Loss-of-Function Intolerance of the Human Genome

Konrad Karczewski, PhD, Computational Biologist, Broad Institute

12:10 pm Session Break

12:20 Luncheon Co-Presentation: The Future State of NGS Data Analysis

Anthony Philippakis, MD, PhD, Chief Data Officer, Broad Institute of MIT and Harvard

Pankaj Srivastava, Computer Science BSc, Vice President of Software and Informatics, Bioinformatics, Illumina

Data analysis is the key to unlocking the power of the genome – turning raw sequencing information into the answers that matter most. Join Illumina and the Broad Institute for a discussion around the future state of next generation sequencing data analysis, and an update on the Illumina ® DRAGEN ™ Bio-IT Platform.

12:50 Session Break

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

1:55 Chairperson’s Remarks

Yuval Itan, PhD, Assistant Professor, Department of Genetics and Genomic Sciences; Member, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai

2:00 Pinpointing Transcript-Damaging Disease-Causing Variants as a Major Step towards RNA Therapeutics

Sahar Gelfman, PhD, Associate Research Scientist, Columbia University Medical Center

The difficulty in capturing pathogenic variants that indirectly damage mRNA formation results in overlooking synonymous and intronic variants when searching for disease risk in sequenced genomes. The Transcript-inferred Pathogenicity (TraP) model was developed to identify sequence context changes that affect splicing decisions and the formation of the final transcript. A random forest model is trained on previously described pathogenic and benign synonymous mutations and identifies damaging variants with over 97% specificity and with a sensitivity three-four times higher than other available scores. Importantly, the specific mode of action of TraP damaging variants can be rescued using carefully designed small molecules, thus identifying these variants is a big step towards personalized treatments for mutation carriers. Since its publication in 2017, TraP has become a major resource for genetic diagnostics that is helping to change the common conception that pathogenic genetic variation is caused solely by coding mutations. TraP has been incorporated in diagnostic pipelines in tens of research institutes worldwide, among which are the NIH, Nationwide Children’s Hospital, SickKids foundation, Massachusetts General Hospital and others. TraP is also available as a website for single queries (www.trap-score.org) that is used systematically by over 1,500 users from clinics and genetic institutes in over 40 countries worldwide, providing successful diagnosis of genetic disorders and affecting treatment decisions.

2:30 AI Assisted Rapid Clinical Whole Genome Sequencing for Clinical Care

Ray Veeraraghavan, PhD, Director of IT & Informatics, Rady Children's Institute for Genomic Medicine

3:00 Deciphering the Complex Heterogeneity of Cancer

Patrice M. Milos, PhD, Co-Founder/President and CEO, Medley Genomics, Inc.

In 2017, 1.7 million people in the US were diagnosed with cancer, and even though cancer survival rates have increased, it still accounts for 1 in 4 deaths annually. Cancer, a heterogeneous disease, has significant tumor cell variability within individual patients, as well as across categories of patients, creating complex barriers to effective and lasting cures for patients. Understanding this heterogeneity will be required to individualize care for patients. Medley Genomics provides a software platform that uses patent-pending algorithms and advanced data analytics to describe a patient's diverse tumor cell mixture. This enables creation of unique molecular diagnostic fingerprints for improving patient diagnosis, monitoring and treatment of cancer, and helps to improve novel oncology therapies and therapeutic combinations including individual cancer vaccine development.

3:30 Estimating Genotypic Heterogeneity Underlying Human Disease

Yuval Itan, PhD, Assistant Professor, Department of Genetics and Genomic Sciences; Member, Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai

Whole exome and whole genome sequencing provide hundreds of thousands of genetic variants per patient, of them only very few are pathogenic. Current computational methods are inefficient in differentiating pathogenic mutations from neutral genetic variants that are predicted to be damaging, and cannot predict the functional outcome of mutations. We will present: (1) a deep learning approach to efficiently detect pathogenic mutations by utilizing extensive annotations and patients’ phenotypic data; (2) a machine learning method combined with natural language processing to estimate whether a mutation results in gain- or loss-of-function; and (3) a cases-controls gene burden study to detect genes and pathways enriched with rare and high impact disease-causing mutations in exomes of over 2,000 Ashkenazi Jewish patients suffering from inflammatory bowel disorder. Finally, we will present new tools to visualize and extract useful information of human, mutations, and DNA/protein sequences for better utilization of next generation sequencing data and understanding of human disease genomics.

4:00 Conference Adjourns

Conference Tracks

T1: Data Platforms & Storage Infrastructure