Bio IT World Expo 2016  
Bio IT World Expo 2016

Track 4 - April 21 – 23, 2015


Developments and Applications for Big Data

Track 4 assembles thought leaders who will discuss the latest developments and applications of bioinformatics to big data in scientific discovery and biomedical research that are contributing to solving real clinical problems and unmet needs in the healthcare and life sciences environment. Themes include modeling of systems and networks, scalable analysis, big data and computational drug design and repositioning, machine learning models, and translating data to patient care. With the ever-increasing volume of information generated for curing or treating diseases and cancers, bioinformatics technologies, tools and techniques play a critical role in turning data into meaningful biological applications and knowledge.

Final Agenda

Download Brochure | Workshops 

Tuesday, April 21

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

Integrative Visualization Strategies for Large-Scale Biological Data

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

How Data-Driven Patient Networks are Transforming Biomedical Research

The Impact of Research Informatics on Laboratory Evolutions

* Separate registration required

2:00 – 6:30 Main Conference Registration



Click here for detailed information. 

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing


Wednesday, April 22

7:00 am Registration Open and Morning Coffee



Click here for detailed information. 

9:00 Benjamin Franklin Awards and Laureate Presentation

9:30 Best Practices Awards Program

Internet 2

9:45 Coffee Break in the Exhibit Hall with Poster Viewing



10:50 Chairperson’s Opening Remarks
Bonnie Feldman, D.D.S., Digital Health Analyst, DrBonnie360 

11:00 An Algorithmic Rationale for the Irreversibility of Biological Ageing

Simon Berkovich, Professor, Computer Science, The George Washington University
Maryam Yammahi, PhD, Computer Science, The George Washington University 

The presentation describes how the process of ageing is related to the specifics of the big data organization of biological information processing.

11:30 Role of Data and Digital Tools in Autoimmune Disorders

Bonnie Feldman, D.D.S., Digital Health Analyst, DrBonnie360

Turning data into useable information is especially challenging for complex chronic diseases such as autoimmune disease. We now have the tools to begin to build more personalized data sets from the ground up, while using this information to find out how to ask the right questions. We will explore innovations in personal data collection such as the work of Larry Smarr and others, new approaches to clinical trial design and data analytics and some of the microbiome research around autoimmune disease. We will also explore bigger picture issues related to data sharing and data donation with specific examples of biobanking and bioregistry data collection and analysis.

12:00 pm IBM Watson Cognitive Computing Applications in Healthcare and Life Sciences

Philip G. Abrahamson, Ph.D., Research Staff, IBM Watson

Information is being created faster than it can be consumed.  This talk will share experiences applying IBM Watson Cognitive Computing to help researchers explore huge volumes of unstructured and structured content to discover insights and information.  Examples include accelerating the understanding of the underlying biology of diseases; identifying, evaluating, and selecting drug targets and candidates, including leveraging safety and toxicity information; improving drug comparative effective studies; and competitive intelligence.

12:30 Session Break

12:40 Luncheon Co-Presentation I: How Revolutionary Machine Learning Advancements Improve Drug Research Productivity and Drive Discovery of Valuable Insights across Disparate Content Repositories

Melissa Chapman, Principal, The Riverhead Group

Phillip Clary, Vice President, Content Analyst Company

Ensuring product quality, efficacy and safety by searching for correlations across disparate collections of eCTDs, articles, reports, and regulatory intelligence can be incredibly time-consuming. Boolean keyword searches can produce false positives and omit relevant results, and laborious taxonomies can be a burden to build and maintain. Using a live demonstration, attendees will see how the latest advances in machine learning technology can dramatically improve productivity and reveal key insights within large collections of unstructured content.

1:10 Luncheon Presentation II (Sponsorship Opportunity Available) or Lunch on Your Own

1:40 Session Break



1:50 Chairperson’s Remarks
Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC 

1:55 Metabolic Biomarkers in Duchenne Muscular Dystrophy

Simina Boca, Ph.D., Assistant Professor, Innovation Center for Biomedical Informatics, Georgetown University Medical Center

Duchenne Muscular Dystrophy (DMD) is a devastating degenerative X-linked disorder which affects approximately 1 in 5,000 newborn males and results in muscle degeneration, eventual loss of ambulation around the age of 9, and a life expectance of around 25 years of age. We considered serum metabolomic profiling of 51 DMD patients and 22 age-matched healthy volunteers in order to find novel serum circulating metabolites for DMD, with the ultimate goal of discovering molecular surrogate markers associated with disease progression, which can be used in future clinical trials. The DMD patients had a minimum age of 4, a maximum age of 28.7, and a median age of 11.4 years, while the healthy controls had a minimum age of 6, a maximum age of 17.8, and a median age of 13.7 years. 22 of the 51 DMD patients were non-ambulatory at the time of serum collection. As expected, age and ambulation status were strongly correlated in the DMD group, where patients with ages between 4 and 17.8 years, with a median of 6.8 years, were ambulatory, while patients between 11.4 and 28.7 years, with a median of 18 years, had lost ambulation. Liquid chromatography – mass spectrometry (LC-MS) techniques were used to process the serum of the study participants, with the XCMS analysis tool detecting a total of 246 peaks in negative mode and 1676 peaks in positive mode. Metabolite values were further log2 transformed, then normalized using internal standards for both modes. A two-class comparison using a two-sample t-test identified 46 peaks associated with disease status at a false discovery rate (FDR) threshold of 0.05, employing a Benjamini-Hochberg correction. A similar comparison was performed for the DMD cases, comparing ambulatory and non-ambulatory individuals, leading to 154 significant peaks at an FDR threshold of 0.05. After the analyses are finalized, significant peaks will be annotated, in order to match the m/z values to metabolite identities. One particular challenge in interpreting these results is eliminating metabolites which are not associated with disease mechanism from further consideration, such as those associated with drugs or dietary supplements used by certain patients. A bioinformatics platform for metabolic data interpretation has been developed and tested to identify DMD-associated biomarkers and will be made available on GitHub once validation is complete. This platform will be presented along with another use case from a breast cancer metabolomics study.
Contributors/Authors: Simina M. Boca1,2, Maki Nishida1, Michael Harris1, Shruti Rao1, Amrita K. Cheema2,3, Kirandeep Gill2, Haeri Seol4, Eric Hoffman4, Erik Henricson5, Craig McDonald5, Yetrib Hathout4 and Subha Madhavan1,2 1Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, D.C.; 2Department of Oncology, Georgetown University Medical Center, Washington, DC; 3Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C.; 4Children’s National Medical Center and the George Washington University, Washington, D.C., 5 Department of Physical Medicine and Rehabilitation, University of California, Davis School of Medicine, Davis, CA. 

2:25 Personalized Medicine: Moving from Correlation to Causality in Breast Cancer

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

Sabrina Molinaro, Ph.D., Institute for Clinical Physiology, National Research Council, Italy

We have developed a fundamental model of the disease process for breast cancer, from pre-disease through early detection, treatment and outcome, and apply a multi-scalar approach across the risk assessment-enhanced diagnosis-therapeutic decision axis and will present the modeling methodologies.

2:55 Streamline R&D and Catalyze Drug Repositioning by Identifying Expert Networks and Expertise

Xavier Pornain, Vice President, Sales & Alliances, Sinequa

Finding networks of experts with similar or complementary expertise on a given subject helps avoid costly redundant research, shed light on a complex research problem from different angles, foster cooperation, facilitate drug repositioning, and accelerate time to market. This session will delve into the benefits pharmaceutical companies are seeing by employing Search & Analytics technology to: “link” researchers and teams with one another, create internal “journals of science” to share internal results and snippets, access “breaking science”, with alerts and spotting trends across all scientific information. We show solutions for dealing with scientific vocabulary, detecting “synonyms” as well as “similar” and “complementary” notions, e.g. brand names for drugs, scientific names for the active ingredients, and even descriptions of molecules using a standard description language. In addition, we analyze vast quantities (200 to 500 million) of highly technical documents and data (billions of records), such as internal and external publications, patent filings, lab reports, clinical test reports, trade databases, etc.

3:10 Cloud-Based Solutions for Population-Scale, Whole Human Genome and Exome Analysis

George Asimenos, Ph.D., Director, Science & Clinical Solutions, DNAnexus

Thanks to advances in sequencing technology, the size and scope of DNA sequencing projects is rapidly moving towards an era of thousands of whole genomes and tens of thousands of exomes per year. Learn how certain field-leading institutes are using a cloud-based bioinformatics platform to manage their big data deluge across multiple initiatives.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

4:00 Using Games as Data Analytical Tools

Melanie Stegman, Ph.D., Owner, Molecular Jig Games, LLC.; Director, Science Game Center

Immune Defense is a video game, but it is also a molecular level simulation of the immune system. Individual data points tell us very specific details about cells, and a large database of these details should tell us a more complete story. But do we have enough data yet to tell the story of one cell, facing one bacterium? It has been a challenge gathering the knowledge to create this small story. Part of Immune Defense game development is the creation of a "game level editor." We can make new molecules, give them new binding partners, assign their affinities for each partner, increase or decrease their relative concentrations and give our enzymes activity... We have created a "medium data" analysis chamber--that is, not Big Data, but more data than one person can hold in their head. We are planning to build up our level editor as a tool for biochemists to analyze their data with much more perspective than ever before. We will also have a tool for scientists, students, public and game developers to use to create realistic scenarios for various purposes, from science fairs to testing to video game development. Play Immune Defense at

4:30 A Rigorous Methodology for Non-Randomized & Observational Study in Healthcare Testing

Gil Weigand, Ph.D., Director, Strategic Projects, Oak Ridge National Laboratory

Healthcare R&D or innovation trials have for more than a decade experienced an acceleration of the application of non-randomized study (NRS), including observational or pragmatic methods. Driven by a demand for rapid translation and patient centeredness using a randomized controlled trial—todays acknowledged “gold standard” for testing in healthcare—may not be practical or desirable when there is a need for flexibility, responsiveness, or timeliness. The challenge for researchers and clinicians using NRS testing is getting sufficient rigor in the scientific evaluation to assure data and study veracity, particularly as complexity and heterogeneity increase in innovation trials. IDAMS-HC achieves the state-of-the-art available today with regard to rigor, technology, and science-based in evaluation of NRS and it supersedes today’s ad hoc methodologies. Moreover it increases external validity. In this presentation, we present an advanced rigorous science-based evaluation methodology for evaluation in healthcare testing. The methodology extends today’s general practice, rapid cycle evaluation, by introducing in silico methods of big data and modeling & simulation and tightly integrating the methods within a knowledge discovery infrastructure. An ACO intervention trial provides initial experience with IDAMS-HC.
Contributors/Authors: Gil Weigand, PhD, Director, Strategic Projects, Computer and Computational Sciences, Oak Ridge National Laboratory (ORNL); Mallikarjun Shankar, PhD, Senior Research Scientist, Computer and Computational Sciences, ORNL; C. Edward McBride, III, MD, MBA, VP, Clinical Services, Summit Medical Group (SMG); Kimberley Kauffman, VP, Value-Based Care, SMG; and Suzanne Kieltyka, Manager, Health Education, SMG 

5:00 Service-Oriented Bioinformatics – the CDC Influenza Sequence Data Management System
John M. Greene, Ph.D., CSM, Senior Director, Bioinformatics, Bioinformatics Solutions and Support, SRA International, Inc. 
Next-Generation Sequencing technologies have opened enormous opportunities for improvements in the surveillance of infectious diseases such as influenza. However, effective use of such sequencing information depends on a robust system to store, manage, analyze, and interpret sequence data. The Influenza Sequence Data Management System (ISDMS) at the Centers for Disease Control and Prevention (CDC)’s Influenza Division in Atlanta fills this role using a service-based approach developed by SRA International that we refer to as 'service-oriented bioinformatics'. Services are small programs that are coordinated by an enterprise service bus, in this case Apache ServiceMix, based on the service-oriented architecture (SOA) model. Services can be written in different languages and act as modular components of the system, providing individual functionality, such as searching, annotation display, and location standardization. These services underpin data loading, data annotation, and data display, and services can be combined to implement new features and reused to speed development. 

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

6:30 Close of Day


Thursday, April 23

7:00 am Registration Open and Morning Coffee



Click here for detailed information. 

10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced



10:30 Chairperson’s Remarks

Michael D. Stadnisky, Ph.D., CEO, FlowJo, LLC

10:40 Structure-Based Algorithms to Predict Drug-Mediated Toxicity

Khaled Barakat, Ph.D., Assistant Professor, Katz Group-Rexall Centre for Pharmacy & Health Research, University of Alberta

This talk presents new insights on how state-of-the-art high performance computing and cutting-edge molecular simulations were used to predict toxicity of candidate drugs. To be biologically active, drugs must physically fit into the binding site(s) within their targets. However, to reach its precise binding location it has to interact with a variety of cellular components with various structures and functions. All these events increase the probability for a drug to bind to undesired off-target(s), which may induce adverse side effects and severe toxicities. During this talk, we will demonstrate how high performance computing combined with conformational sampling, molecular docking and molecular dynamics (MD) simulations were used to build realistic models for these off-targets. We will focus on cardiotoxicity and use our newly developed hERG model as a test case.

11:10 An Informatics Solution for the Precise Registration and Visualization of Biological Molecules

Roxanne Kunz, Ph.D., Senior Scientist, Therapeutic Discovery, Amgen, Inc.

A custom bioinformatics software application for the registration and representation of biological molecules will be described. The system includes a flexible, modality-independent editor to define biomolecules in a step-wise fashion, backed by a chemical structure-based database catalog to precisely capture atomic-level modifications of amino acids and other non-proteinaceous components. Data-driven visual representations based on canonical biological molecule structure reference types, such as IgG1 monoclonal antibodies and subtypes thereof, are dynamically constructed and interactive.

11:40 Man Versus Machine: Validating, Optimizing, and Predicting Outcomes in Single Cell Phenomics

Michael D. Stadnisky, Ph.D., CEO, FlowJo, LLC

The exponential increase in the throughput and content of flow and mass cytometry assays has challenged the paradigm of DIY data management, manual analysis, and 2D visualization in single cell phenomics.  We have developed and assessed the ability of an automated pipeline to direct analysis, statistical cluster comparison for iterative pipeline improvement, plug-and-play automated clustering algorithms, and predictive phenotype prediction.

12:10 pm Session Break

12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

1:55 Chairperson’s Remarks

2:00 Using Deep Learning Techniques for Word Vectors Generation for Insight into Poorly Structured Textual Data

Mark Pinches, Senior Scientist, Data Modeling and Bioinformatics, Drug Safety and Metabolism, AstraZeneca

Word vectors carry a number of interesting properties that can be applied to textual data in order to cluster/stratify data for additional analysis. This emerging approach has been used in other fields but here we apply it to biological data. This presentation illustrates a concrete example of this technique with clear and unique outcomes.

2:30 Examining the Health Effects of Multiple Environmental Exposures on Subpopulations Using a Big Data Platform

Chirag Patel, Ph.D., Research Associate, Center for Biomedical Informatics, Harvard Medical School/Pivotal, Inc.

The environment plays a large role in human health, but we lack computational tools to unlock the relationship between exposure and disease. For example, most studies examine the effects of single environmental agents and ignore the complex interplay of an individual’s characteristics and comorbidities. This offers an unrealistic view of the human experience. With the availability of scalable platforms, we are now able to ask many more questions to examine more complex relationships between environmental exposure and disease. In this talk we will present our results from examining the complex interdependencies between 1000s of environmental agents (e.g., infectious agents, biomarkers of pollutants, and nutrient exposure) in datasets representative of the US population. Specifically, we consider over 100K individual pairwise correlations between exposures and quantitative health-related traits in different segments of the US population, a computationally burdensome task but critical toward a "human exposome project".

3:00 Streamlined Planning, Execution, Data Capture and Analysis of Peptide Preformulation Stability Studies

Roman Affentranger, Dr. sc. Nat, Head, Small Molecule Discovery Workflows, Roche

The presentation will illustrate what we have implemented for the peptide preformulation scientists in their electronic lab notebook to efficiently design peptide formulation stability studies. The study can cover a number of different formulations, and with the definition of time points, stress conditions and desired analytical methods the required number of vials as well as individual material amounts are automatically calculated. Individual analytical results are captured through predesigned templates that are specific for the different types or analytics, and the results are linked to the stability study by a unique study ID. All along the study execution, as well as for the completed study, all the data - formulation composition, study design and all analytical results - are pulled together in a data analysis tool, also within the electronic lab notebook. The data analysis tool allows selection of individual or multiple formulations, time points, and stress points, and therefore offers unique insights which greatly support decision making for improved formulation design.

3:30 Welcome to the Future: Data Analysis in a Language Workbench

Fabien Campagne, Ph.D., Assistant Professor and Laboratory Head, Institute for Computational Biomedicine, Weill Cornell Medical College

We have devised an analysis tool ( that takes advantage of Language Workbench (LW) technology and in particular of the open-source Meta Programming System. Using this technology, we created languages for biological data analysis that do not require a strong computational background, which can be used by most biomedical researchers, yet that scale to large datasets. These languages provide high-level abstractions such as organisms, filesets, execution nodes, analysis plugin, or analysis tasks that model the biological data and the computational environment in simple ways. End-users interacting with the workbench can configure analyses that will execute on parallel computers without these users knowing about parallel computing, programming, scripting languages or the command line. The talk will highlight how both end-users and tool designers can benefit from this technology.

4:00 Conference Adjourns

Download Brochure | Workshops 

Reg Early


View 2015 Brochure
View 2015 Brochure
View Videos & Photos 
Platinum Sponsors

Cycle Computing logo

DDN Storage  


Illumnia logo  

Intel Logo  


Official Media Partner

Conference CD

CD iconOrder the 2015 event proceedings - now available on CD

Complimentary Downloads

View white papers, listen to podcasts, and more!

  • Making the World's Knowledge Computable
  • Bioinformatics in the Cloud
  • The Application of Text Analytics to Drug Safety Surveillance

Related Event

 Medical Informatics World Related