Track 4 - April 21 – 23, 2015
Developments and Applications for Big Data
Track 4 assembles thought leaders who will discuss the latest developments and applications of bioinformatics to big data in scientific discovery and biomedical research that are contributing to solving real clinical problems and unmet needs in the healthcare and life sciences environment. Themes include modeling of systems and networks, scalable analysis, big data and computational drug design and repositioning, machine learning models, and translating data to patient care. With the ever-increasing volume of information generated for curing or treating diseases and cancers, bioinformatics technologies, tools and techniques play a critical role in turning data into meaningful biological applications and knowledge.
Download Brochure | Workshops
Tuesday, April 21
7:00 am Workshop Registration and Morning Coffee
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
Integrative Visualization Strategies for Large-Scale Biological Data
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
How Data-Driven Patient Networks are Transforming Biomedical Research
The Impact of Research Informatics on Laboratory Evolutions
* Separate registration required
2:00 – 6:30 Main Conference Registration
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing
Wednesday, April 22
7:00 am Registration Open and Morning Coffee
9:00 Benjamin Franklin Awards and Laureate Presentation
9:30 Best Practices Awards Program
9:45 Coffee Break in the Exhibit Hall with Poster Viewing
10:50 Chairperson’s Opening Remarks
Bonnie Feldman, D.D.S., Digital Health Analyst, DrBonnie360
11:00 An Algorithmic Rationale for the Irreversibility of Biological Ageing
Simon Berkovich, Professor, Computer Science, The George Washington University
Maryam Yammahi, PhD, Computer Science, The George Washington University
The presentation describes how the process of ageing is related to the specifics of the big data organization of biological information processing.
11:30 Role of Data and Digital Tools in Autoimmune Disorders
Bonnie Feldman, D.D.S., Digital Health Analyst, DrBonnie360
Turning data into useable information is especially challenging for complex chronic diseases such as autoimmune disease. We now have the tools to begin to build more personalized data sets from the ground up, while using this information to find out how to ask the right questions. We will explore innovations in personal data collection such as the work of Larry Smarr and others, new approaches to clinical trial design and data analytics and some of the microbiome research around autoimmune disease. We will also explore bigger picture issues related to data sharing and data donation with specific examples of biobanking and bioregistry data collection and analysis.
12:00 pm IBM Watson Cognitive Computing Applications in Healthcare and Life Sciences
Philip G. Abrahamson, Ph.D., Research Staff, IBM Watson
Information is being created faster than it can be consumed. This talk will share experiences applying IBM Watson Cognitive Computing to help researchers explore huge volumes of unstructured and structured content to discover insights and information. Examples include accelerating the understanding of the underlying biology of diseases; identifying, evaluating, and selecting drug targets and candidates, including leveraging safety and toxicity information; improving drug comparative effective studies; and competitive intelligence.
12:30 Session Break
12:40 Luncheon Co-Presentation I: How Revolutionary Machine Learning Advancements Improve Drug Research Productivity and Drive Discovery of Valuable Insights across Disparate Content Repositories
Melissa Chapman, Principal, The Riverhead Group
Phillip Clary, Vice President, Content Analyst Company
Ensuring product quality, efficacy and safety by searching for correlations across disparate collections of eCTDs, articles, reports, and regulatory intelligence can be incredibly time-consuming. Boolean keyword searches can produce false positives and omit relevant results, and laborious taxonomies can be a burden to build and maintain. Using a live demonstration, attendees will see how the latest advances in machine learning technology can dramatically improve productivity and reveal key insights within large collections of unstructured content.
1:10 Luncheon Presentation II (Sponsorship Opportunity Available) or Lunch on Your Own
1:40 Session Break
1:50 Chairperson’s Remarks
Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC
1:55 Metabolic Biomarkers in Duchenne Muscular Dystrophy
Simina Boca, Ph.D., Assistant Professor, Innovation Center for Biomedical Informatics, Georgetown University Medical Center
Duchenne Muscular Dystrophy (DMD) is a devastating degenerative X-linked disorder which affects approximately 1 in 5,000 newborn males and results in muscle degeneration, eventual loss of ambulation around the age of 9, and a life expectance of around 25 years of age. We considered serum metabolomic profiling of 51 DMD patients and 22 age-matched healthy volunteers in order to find novel serum circulating metabolites for DMD, with the ultimate goal of discovering molecular surrogate markers associated with disease progression, which can be used in future clinical trials. The DMD patients had a minimum age of 4, a maximum age of 28.7, and a median age of 11.4 years, while the healthy controls had a minimum age of 6, a maximum age of 17.8, and a median age of 13.7 years. 22 of the 51 DMD patients were non-ambulatory at the time of serum collection. As expected, age and ambulation status were strongly correlated in the DMD group, where patients with ages between 4 and 17.8 years, with a median of 6.8 years, were ambulatory, while patients between 11.4 and 28.7 years, with a median of 18 years, had lost ambulation. Liquid chromatography – mass spectrometry (LC-MS) techniques were used to process the serum of the study participants, with the XCMS analysis tool detecting a total of 246 peaks in negative mode and 1676 peaks in positive mode. Metabolite values were further log2 transformed, then normalized using internal standards for both modes. A two-class comparison using a two-sample t-test identified 46 peaks associated with disease status at a false discovery rate (FDR) threshold of 0.05, employing a Benjamini-Hochberg correction. A similar comparison was performed for the DMD cases, comparing ambulatory and non-ambulatory individuals, leading to 154 significant peaks at an FDR threshold of 0.05. After the analyses are finalized, significant peaks will be annotated, in order to match the m/z values to metabolite identities. One particular challenge in interpreting these results is eliminating metabolites which are not associated with disease mechanism from further consideration, such as those associated with drugs or dietary supplements used by certain patients. A bioinformatics platform for metabolic data interpretation has been developed and tested to identify DMD-associated biomarkers and will be made available on GitHub once validation is complete. This platform will be presented along with another use case from a breast cancer metabolomics study.
Contributors/Authors: Simina M. Boca1,2, Maki Nishida1, Michael Harris1, Shruti Rao1, Amrita K. Cheema2,3, Kirandeep Gill2, Haeri Seol4, Eric Hoffman4, Erik Henricson5, Craig McDonald5, Yetrib Hathout4 and Subha Madhavan1,2
1Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, D.C.; 2Department of Oncology, Georgetown University Medical Center, Washington, DC; 3Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C.; 4Children’s National Medical Center and the George Washington University, Washington, D.C., 5 Department of Physical Medicine and Rehabilitation, University of California, Davis School of Medicine, Davis, CA.
2:25 Personalized Medicine: Moving from Correlation to Causality in Breast Cancer
Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC
Sabrina Molinaro, Ph.D., Institute for Clinical Physiology, National Research Council, Italy
We have developed a fundamental model of the disease process for breast cancer, from pre-disease through early detection, treatment and outcome, and apply a multi-scalar approach across the risk assessment-enhanced diagnosis-therapeutic decision axis and will present the modeling methodologies.
2:55 Streamline R&D and Catalyze Drug Repositioning by Identifying Expert Networks and Expertise
Xavier Pornain, Vice President, Sales & Alliances, Sinequa
Finding networks of experts with similar or complementary expertise on a given subject helps avoid costly redundant research, shed light on a complex research problem from different angles, foster cooperation, facilitate drug repositioning, and accelerate time to market. This session will delve into the benefits pharmaceutical companies are seeing by employing Search & Analytics technology to: “link” researchers and teams with one another, create internal “journals of science” to share internal results and snippets, access “breaking science”, with alerts and spotting trends across all scientific information. We show solutions for dealing with scientific vocabulary, detecting “synonyms” as well as “similar” and “complementary” notions, e.g. brand names for drugs, scientific names for the active ingredients, and even descriptions of molecules using a standard description language. In addition, we analyze vast quantities (200 to 500 million) of highly technical documents and data (billions of records), such as internal and external publications, patent filings, lab reports, clinical test reports, trade databases, etc.
3:10 Cloud-Based Solutions for Population-Scale, Whole Human Genome and Exome Analysis
George Asimenos, Ph.D., Director, Science & Clinical Solutions, DNAnexus
Thanks to advances in sequencing technology, the size and scope of DNA sequencing projects is rapidly moving towards an era of thousands of whole genomes and tens of thousands of exomes per year. Learn how certain field-leading institutes are using a cloud-based bioinformatics platform to manage their big data deluge across multiple initiatives.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing
4:00 Extending Galaxy with External Microbiome Databases
Bob Brown, Ph.D., Affiliate Professor, Environmental Science, George Mason University
Galaxy is a fantastic framework for repeatable multiple tool processing on specific data domains. We have extended Galaxy with a Drupal interface to a MySQL Database of Microbiome datasets. This allows for seamless access of specific subsets of metadata and sequences to be brought into Galaxy for processing.
4:30 A Rigorous Methodology for Non-Randomized & Observational Study in Healthcare Testing
Gil Weigand, Ph.D., Director, Strategic Projects, Oak Ridge National Laboratory
Healthcare R&D or innovation trials have for more than a decade experienced an acceleration of the application of non-randomized study (NRS), including observational or pragmatic methods. Driven by a demand for rapid translation and patient centeredness using a randomized controlled trial—todays acknowledged “gold standard” for testing in healthcare—may not be practical or desirable when there is a need for flexibility, responsiveness, or timeliness. The challenge for researchers and clinicians using NRS testing is getting sufficient rigor in the scientific evaluation to assure data and study veracity, particularly as complexity and heterogeneity increase in innovation trials. IDAMS-HC achieves the state-of-the-art available today with regard to rigor, technology, and science-based in evaluation of NRS and it supersedes today’s ad hoc methodologies. Moreover it increases external validity. In this presentation, we present an advanced rigorous science-based evaluation methodology for evaluation in healthcare testing. The methodology extends today’s general practice, rapid cycle evaluation, by introducing in silico methods of big data and modeling & simulation and tightly integrating the methods within a knowledge discovery infrastructure. An ACO intervention trial provides initial experience with IDAMS-HC.
Contributors/Authors: Gil Weigand, PhD, Director, Strategic Projects, Computer and Computational Sciences, Oak Ridge National Laboratory (ORNL); Mallikarjun Shankar, PhD, Senior Research Scientist, Computer and Computational Sciences, ORNL; C. Edward McBride, III, MD, MBA, VP, Clinical Services, Summit Medical Group (SMG); Kimberley Kauffman, VP, Value-Based Care, SMG; and Suzanne Kieltyka, Manager, Health Education, SMG
5:00 Presentation to be Announced
5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing
6:30 Close of Day
Thursday, April 23
7:00 am Registration Open and Morning Coffee
10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced
10:30 Chairperson’s Remarks
10:40 Structure-Based Algorithms to Predict Drug-Mediated Toxicity
Khaled Barakat, Ph.D., Assistant Professor, Katz Group-Rexall Centre for Pharmacy & Health Research, University of Alberta
This talk presents new insights on how state-of-the-art high performance computing and cutting-edge molecular simulations were used to predict toxicity of candidate drugs. To be biologically active, drugs must physically fit into the binding site(s) within their targets. However, to reach its precise binding location it has to interact with a variety of cellular components with various structures and functions. All these events increase the probability for a drug to bind to undesired off-target(s), which may induce adverse side effects and severe toxicities. During this talk, we will demonstrate how high performance computing combined with conformational sampling, molecular docking and molecular dynamics (MD) simulations were used to build realistic models for these off-targets. We will focus on cardiotoxicity and use our newly developed hERG model as a test case.
11:10 An Informatics Solution for the Precise Registration and Visualization of Biological Molecules
Roxanne Kunz, Ph.D., Senior Scientist, Therapeutic Discovery, Amgen, Inc.
A custom bioinformatics software application for the registration and representation of biological molecules will be described. The system includes a flexible, modality-independent editor to define biomolecules in a step-wise fashion, backed by a chemical structure-based database catalog to precisely capture atomic-level modifications of amino acids and other non-proteinaceous components. Data-driven visual representations based on canonical biological molecule structure reference types, such as IgG1 monoclonal antibodies and subtypes thereof, are dynamically constructed and interactive.
11:40 Man Versus Machine: Validating, Optimizing, and Predicting Outcomes in Single Cell Phenomics
Michael D. Stadnisky, Ph.D., CEO, FlowJo, LLC
The exponential increase in the throughput and content of flow and mass cytometry assays has challenged the paradigm of DIY data management, manual analysis, and 2D visualization in single cell phenomics. We have developed and assessed the ability of an automated pipeline to direct analysis, statistical cluster comparison for iterative pipeline improvement, plug-and-play automated clustering algorithms, and predictive phenotype prediction.
12:10 pm Session Break
12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing
1:55 Chairperson’s Remarks
2:00 Using Deep Learning Techniques for Word Vectors Generation for Insight into Poorly Structured Textual Data
Mark Pinches, Senior Scientist, Data Modeling and Bioinformatics, Drug Safety and Metabolism, AstraZeneca
Word vectors carry a number of interesting properties that can be applied to textual data in order to cluster/stratify data for additional analysis. This emerging approach has been used in other fields but here we apply it to biological data. This presentation illustrates a concrete example of this technique with clear and unique outcomes.
2:30 Examining the Health Effects of Multiple Environmental Exposures on Subpopulations Using a Big Data Platform
Chirag Patel, Ph.D., Research Associate, Center for Biomedical Informatics, Harvard Medical School/Pivotal, Inc.
The environment plays a large role in human health, but we lack computational tools to unlock the relationship between exposure and disease. For example, most studies examine the effects of single environmental agents and ignore the complex interplay of an individual’s characteristics and comorbidities. This offers an unrealistic view of the human experience. With the availability of scalable platforms, we are now able to ask many more questions to examine more complex relationships between environmental exposure and disease. In this talk we will present our results from examining the complex interdependencies between 1000s of environmental agents (e.g., infectious agents, biomarkers of pollutants, and nutrient exposure) in datasets representative of the US population. Specifically, we consider over 100K individual pairwise correlations between exposures and quantitative health-related traits in different segments of the US population, a computationally burdensome task but critical toward a "human exposome project".
3:00 Streamlined Planning, Execution, Data Capture and Analysis of Peptide Preformulation Stability Studies
Roman Affentranger, Dr. sc. Nat, Head, Small Molecule Discovery Workflows, Roche
The presentation will illustrate what we have implemented for the peptide preformulation scientists in their electronic lab notebook to efficiently design peptide formulation stability studies. The study can cover a number of different formulations, and with the definition of time points, stress conditions and desired analytical methods the required number of vials as well as individual material amounts are automatically calculated. Individual analytical results are captured through predesigned templates that are specific for the different types or analytics, and the results are linked to the stability study by a unique study ID. All along the study execution, as well as for the completed study, all the data - formulation composition, study design and all analytical results - are pulled together in a data analysis tool, also within the electronic lab notebook. The data analysis tool allows selection of individual or multiple formulations, time points, and stress points, and therefore offers unique insights which greatly support decision making for improved formulation design.
3:30 Welcome to the Future: Data Analysis in a Language Workbench
Fabien Campagne, Ph.D., Assistant Professor and Laboratory Head, Institute for Computational Biomedicine, Weill Cornell Medical College
We have devised an analysis tool (http://workbench.campagnelab.org) that takes advantage of Language Workbench (LW) technology and in particular of the open-source Meta Programming System. Using this technology, we created languages for biological data analysis that do not require a strong computational background, which can be used by most biomedical researchers, yet that scale to large datasets. These languages provide high-level abstractions such as organisms, filesets, execution nodes, analysis plugin, or analysis tasks that model the biological data and the computational environment in simple ways. End-users interacting with the workbench can configure analyses that will execute on parallel computers without these users knowing about parallel computing, programming, scripting languages or the command line. The talk will highlight how both end-users and tool designers can benefit from this technology.
4:00 Conference Adjourns
Download Brochure | Workshops