The Bioinformatics track assembles thought leaders who will present case studies using computational resources and tools that discuss the problems and challenges of taking data from multiple -omics sources and aligning it with clinical action. Turning big data into smart data can lead to real-time assistance in disease prevention, prognosis, diagnostics, and therapeutics. With the ever-increasing volume of information generated for curing or treating diseases and cancers, bioinformatics technologies, tools, and techniques play a critical role in turning data into actionable knowledge to meet unstated and unmet medical needs. Case studies will be presented on addressing these problems and challenges, including making the jump from prototyping to production code, defining what a "validated" informatics pipeline means, how to balance agility needs with requirements to be consistent/compliant, pipeline and workflow frameworks, containerization for reproducibility, and more. How do your approaches deal with inconsistencies in definitions and meta-data across the multiple datasets that form the basis of big data?

Final Agenda

Monday, April 20

9:00 am - 5:00 pm Hackathon*

*Pre-registration required.

Tuesday, April 21

7:30 am Workshop Registration Open and Morning Coffee

8:30 am - 3:30 pm Hackathon*

*Pre-registration required.

8:30 - 11:30 am Recommended Morning Pre-Conference Workshops*

W3. Introduction to Data Visualization for Biomedical Applications

Nils Gehlenborg, PhD, Assistant Professor, Department of Biomedical Informatics, Harvard Medical School

Alexander Lex, PhD, Assistant Professor, SCI Institute, School of Computing, University of Utah

12:30 - 3:30 pm Recommended Afternoon Pre-Conference Workshops*

W12. Cancer Genome Analysis

Jeffrey Rosenfeld, PhD, Manager, Biomedical Informatics Shared Resource and Assistant Professor of Pathology and Laboratory Medicine, Rutgers Cancer Institute of New Jersey; President, Rosenfeld Consulting LLC

*Separate registration required.

2:00 - 6:30 Main Conference Registration Open

4:00 Welcome Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute




4:05 Keynote Introduction

4:15 PLENARY KEYNOTE PRESENTATION: NIH’s Strategic Vision for Data Science

Susan K. Gregurick, PhD, Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health





Rebecca Baker, PhD, Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health





5:00 - 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing (Sponsorship Opportunity Available)

Wednesday, April 22

7:30 am Registration Open and Morning Coffee

8:00 Welcome Remarks

Allison Proffitt, Editorial Director, Bio-IT World




8:05 Keynote Introduction

8:15 Toward Preventive Genomics: Lessons from MedSeq and BabySeq

Robert Green, MD, MPH, Professor of Medicine (Genetics) and Director, G2P Research Program/Preventive Genomics Clinic, Brigham & Women’s Hospital, Broad Institute, and Harvard Medical School




8:45 PANEL DISCUSSION: Game On: How AI, Citizen Science, and Human Computation Are Facilitating the Next Leap Forward

Pietro Michelucci, PhD, Director, Human Computation Institute






Additional Panelists to be Announced

9:45 Coffee Break in the Exhibit Hall with Poster Viewing


10:50 Organizer’s Welcome Remarks

Cambridge Healthtech Institute

10:55 Chairperson’s Remarks

11:00 KEYNOTE PRESENTATION: Using Networks to Understand Genetic and Genomic Drivers of Disease

John Quackenbush, PhD, Henry Pickering Walcott Professor of Computational Biology and Bioinformatics; Chair, Department of Biostatistics, Harvard T.H. Chan School of Public Health

This presentation will address the problem of biological complexity in which many factors, each of small effect size, collectively influence disease risk, development, complexity, and response to therapy in cancer and other complex diseases. By using innovative computational methods built around network representations of biological interactions, we can gain insight into the disease process, develop predictive biomarkers, and identify possible avenues of therapeutic intervention.

11:30 Precision Cancer Medicine

Jeffrey Rosenfeld, PhD, Manager, Biomedical Informatics Shared Resource and Assistant Professor of Pathology and Laboratory Medicine, Rutgers Cancer Institute of New Jersey; President, Rosenfeld Consulting LLC

This presentation will illustrate the current methods that are used for determining the precise treatment of cancer rather than the standard chemotherapy methods.

12:00 pm Sponsored Presentation (Opportunity Available)

12:30 Session Break

12:40 LUNCHEON PRESENTATION I: Advancing Precision Medicine with a Complete Bioinformatics Ecosystem

Brandi Davis-Dusenbery, PhD, CSO, Seven Bridges

Elsevier-square1:10 Luncheon Presentation II to be Announced



1:40 Session Break


1:50 Chairperson’s Remarks

1:55 Building an Artificial Intelligence-Based Vaccine Discovery System – Applications in Infectious Diseases & Personalized Neoantigen-Related Immunotherapy for Treatment of Cancers

Kamal Rawal, PhD, Associate Professor, Amity University, India; Adjunct Faculty, Baylor College of Medicine, Houston, USA

Infectious disease affects several million individuals all over the world, particularly from developing countries. We have built a bioinformatics pipeline which combines reverse vaccinology tools, network biology system and text mining algorithms to analyses proteomes of pathogens and ranks proteins based upon their propensity to be an optimal vaccine candidate. Our system compares various machine learning approaches such as support vector machines, neural networks, ensemble learning & decision trees.

2:15 Flexible Platform for Providing Broad-Based Bioinformatics Service

Ethan Yaoyu Wang, PhD, Senior Research Scientist, Department of Biostatistics, Harvard T.H. Chan School of Public Health

We introduce CNAP, a flexible cloud-based framework for distributing bioinformatics pipelines as a web-service application to non-technical researchers with limited computational support. CNAP is particularly suited for research community with limited computational resources and bioinformatics personnel to provide broad-based support on projects with a wide range of computational requirements and dataset sizes.

2:35 Data Processing, Reproducibility and Flexibility in a Biomedical Research Organization

Asaf Peer, PhD, Associate Computational Scientist, The Jackson Laboratory

Bioinformatics software is an ever-changing arena with new experimental methods and analysis packages. Pipeline management systems are an elegant solution towards managing this complex and fast paced environment. These systems have the added advantages of creating reproducible, efficient and sharable bioinformatic workflows, a critical component for . At The Jackson Laboratory (JAX) we want to make sure analyses are uniform across the organization, easy to update and manage, follow a plug and play model, are accessible to all research groups within and outside JAX and are easily reproducible. To address these needs, the Computational Sciences department has created and maintains a local repository of several pipelines across both human and mouse model organisms. These pipelines address a wide range of genomic applications along with configuration files to easily execute nextflow and wdl pipelines using our resources. In addition, we have developed a deployment system for the various pipelines that uses JAX’s HPC cluster and also on the cloud.

2:55 Sponsored Presentation (Opportunity Available)

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing


4:00 Chairperson’s Remarks

4:05 BLAST, Pipelines and FAIR on the Cloud

Thomas Madden, PhD, Staff Scientist, NCBI/NLM/NIH

A sequence similarity search often provides essential information about a DNA or protein sequence. With the rapidly expanding use of high throughput sequencing, a few issues may occur for BLAST users. First, the need for searches may come in bursts, with many searches needing to be done at once and current resources unable to handle the load. Second, many searches are now part of a pipeline, which can be a powerful multiplier for bioinformatics tools, but is not straight-forward to maintain if it does not conform to FAIR (Findable, Accessible, Interoperable, and Reusable) principles. The cloud can help with the first problem, allowing a user to scale up the computational resources per their specific needs. Containerization and formal pipeline languages can help with the second issue, making pipelines more reproducible and easier to maintain. We discuss a containerized version of BLAST, usage with CWL, and a cloud infrastructure that includes databases hosted on cloud providers.


4:25 Data Integration Expectation Maps Project

ClarLynda Williams-DeVane, PhD, Chair, Data Science and Bioinformatics, Fisk University

4:45 mTOR System: A Database for Systems-Level Biomarker Discovery in Cancer

Iman Tavassoly, MD, PhD, Physician-Scientist, Mount Sinai Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai

mTOR system is a database I have designed for exploring biomarkers and systems-level data related to mTOR pathway in cancer. This database consists of different layers of molecular markers and quantitative parameters assigned to them through current mathematical model. This database is an example of merging systems-level data with mathematical models for precision oncology.

5:05 Sponsored Presentation (Opportunity Available)


5:35 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

6:45 End of Day

Thursday, April 23

7:30 am Registration Open and Morning Coffee

8:00 Organizer’s Remarks

Cindy Crowninshield, RDN, LDN, Executive Event Director, Cambridge Healthtech Institute




8:05 Awards Program Introduction

8:10 Benjamin Franklin Award and Laureate Presentation

J.W. Bizzaro, Managing Director,




Discngine8:35 Bio-IT World Innovative Practices Awards

Allison Proffitt, Editorial Director, Bio-IT World




9:00 AI in Pharma: Where We Are Today and How We Will Succeed in the Future

Natalija Jovanovic, PhD, Chief Digital Officer, Sanofi Pasteur




9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced at 10:00


10:30 Organizer’s Remarks

Cambridge Healthtech Institute

10:35 Chairperson’s Remarks

10:40 Powering Question-Driven Problem Solving to Improve the Chances of Finding New Medicines

Samiul Hasan, PhD, Scientific Analytics and Visualization Director, Data and Computational Sciences, GlaxoSmithKline

Making true “molecule”-”mechanism”-”observation” relationship connections is a time consuming, iterative and laborious process. In addition, it is very easy to miss critical information that affects key decisions or helps make plausible scientific connections. The current practice for deciphering such relationships frequently involves subject matter experts (SMEs) requesting resource from resource-constrained data science departments to refine and redo highly similar ad hoc searches. The result of this is impairment of both the pace and quality of scientific reviews. In this presentation, I show how semantic integration can be made to ultimately become part of an integrated learning framework for more informed scientific decision-making. I will take the audience through our pilot journey and highlight practical learnings that should inform subsequent endeavors.

11:10 Computational Efforts on Drug Repurposing for Rare Diseases

Bin Li, PhD, Director, Computational Biology, Takeda Pharmaceutics

We conducted in silico screens trying to repurpose >100 compounds for ~4000 rare disease indications. Various data types were utilized (protein-protein interaction network, pathways, disease driven genes, competitive intelligence, etc), and different computational methods were implemented and evaluated. Some biologically interesting drug/disease pairs were observed.

11:40 Presentation to be Announced

12:10 pm Session Break

12:20 Luncheon Presentation I to be Announced

12:50 Luncheon Presentation II (Sponsorship Opportunity Available)

1:20 Dessert Refreshment Break in the Exhibit Hall with Last Chance Poster Viewing


1:55 Chairperson’s Remarks

Lijian Yu, PhD, Senior Bioinformatics Specialist, AbbVie

2:00 Pediatric Cell Atlas: Using Single-Cell Technology to Understand Childhood Health and Disease

Deanne Taylor, PhD, Director of Bioinformatics, DBH, Children’s Hospital of Philadelphia

2:30 Scaling scRNASeq Visualization to Unlimited Datasets with Cellxgene Gateway

Alok Saldanha, PhD, Technical Associate Director, NIBR Informatics, Novartis Institutes for Biomedical Research

Cellxgene Gateway is an open source tool ( which allows you to use the Cellxgene Server provided by the Chan Zuckerberg with multiple datasets. I will introduce this tool in the context of a typical single-cell RNA-Seq analysis workflow, and touch on deployment issues in an enterprise cloud with a budget.

3:00 A Software Platform to Spot Single Cells in Drug Discovery

Lijian Yu, PhD, Senior Bioinformatics Specialist, AbbVie

As single-cell RNA (scRNA) sequencing is increasingly used as a powerful tool to investigate human diseases and assist drug discovery process by enabling scientists to comprehend the transcriptomes of tens of thousands of individual cells, the volume and complexity of data become a huge challenge. At AbbVie we have teamed with many scientists across the globe to develop a uniform data storage and visualization platform based on Spotifire, robust data storage strategy, and web technologies to develop front end UI. Scientists can now freely explore the data and gain important insights into the cells by utilizing the data across several cells and indications. Authors: Lijian Yu, Anne-Sophie Barthelet, Rishi R. Gupta.

3:30 Embedding Single-Cell RNA-Seq Profiles in Non-Euclidean Spaces

Jiarui Ding, PhD, Postdoctoral Researcher, Aviv Regev’s Lab, Broad Institute of MIT and Harvard

Single-cell RNA-Seq has become an invaluable tool for studying biological systems in health and diseases. We introduced scPhere, a scalable deep generative model to embed cells into low-dimensional hyperspherical or hyperbolic spaces, as a more accurate representation of the data. scPhere resolves cell crowding, corrects multiple, complex batch factors, facilitates interactive visualization of large datasets, and gracefully uncovers pseudotemporal trajectories.

4:00 Close of Conference

Platinum Sponsors