Track 11 - April 5 – 7, 2016

Open Source Innovations

Integrated Informatics Solutions to Optimize Collaborative Biomedical Research

Track 11 presents case studies on collaborative and productivity software, platforms, tools, and models used to aggregate, harmonize, and interpret data from heterogeneous sources to accelerate basic, translational and clinical research. Speakers will show how crowdsourcing answers from networks is helping to empower transformative change by delivering life-saving medicine and information as quickly as possible.

Tuesday, April 5

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
Security Considerations for Virtual Research

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
iConquerMS™: A Patient-Centered Research Model

* Separate registration required

2:00 – 6:00 Main Conference Registration


Click here for detailed information.

Precision for Medicine5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, April 6

7:00 am Registration Open and Morning Coffee


Click here for detailed information.

9:00 Benjamin Franklin Awards and Laureate Presentation

9:30 Best Practices Awards Program

9:45 Coffee Break in the Exhibit Hall with Poster Viewing



10:50 Chairperson’s Opening Remarks

Anil Srivastava, President, Open Health Systems Laboratory

11:00 Panel Discussion: IUCKA: Indo-US Cancer Knowledge Alliance

Moderator: Anil Srivastava, President, Open Health Systems Laboratory

Kenneth Buetow, Ph.D., Director of Computational Sciences and Informatics, Complex Adaptive Systems Initiative (CASI), Arizona State University

Rajendra Joshi, Ph.D., Associate Director and Head, Bioinformatics Group, Centre for Development of Advanced Computing, Pune University Campus

IUCKA: Indo-US Cancer Knowledge Alliance is being designed as an integrated biomedical informatics cyberinfrastructure for cancer treatment and research in India. It will be a true translational research platform from bench to bedside connecting cancer treatment and research centers across the country with access and connection to global centers of research, especially in the United States. The promoters of the IUCKA are Arizona State University, Open Health Systems Laboratory and Varian Medical Systems. IUCKA is being implemented as a PPP (public private partnership) and is bringing together technology products and service providers and cancer treatment and research centers in an ecosystem to directly benefit cancer patients in India and contribute to global research collaboration, especially between cancer centers in India.

12:00 pm Managing Data Across the Research Life-Cycle for Life Sciences

George Vacek, Global Director, Life Sciences, DDN

Dr. Vacek will deliver several in-depth case studies of leading life sciences organizations leveraging high performance & high scale data solutions for genomics, imaging & simulation workflows. Cases will focus on implemented solutions: capturing & effectively exploiting large scale data at speed, regulated & non-regulated stewardship considerations, transitioning from non-scaling architectures & bringing the benefits of high-end HPC technologies & techniques into smaller deployments & collaborative scenarios.


12:15 Data Management in Large Scale Sequencing and Analysis

Kirill Malkin, Director, Storage Engineering, SGI

Next Generation Sequencing and its accompanying analyses are driving exponential growth in sequence data that needs to be stored, analyzed, and made accessible for future interrogations. This session presents a converged storage-and-analytics infrastructure framework based on SGI’s experience in enabling data-intensive supercomputing solutions – along with genomics customer case examples and best practices for simplifying the management of data sets that can contain billions of files/objects.

12:30 Session Break

12:40 Luncheon Presentation I: Accelerating the Analysis of High-Throughput Sequencing

Ketan Paranjape, General Manager, Life Sciences, Health and Life Sciences, Intel

Panelists: Paolo Narvaez, Ph.D., Principal Engineer & Director, Personalized Care Platform, Intel Corporation

Adam Kiezun, Ph.D., Senior Group Leader, Computational Methods Development, Broad Institute of MIT and Harvard

Jeff Gentry, Principal Software Engineer, Broad Institute

Accelerating the analysis of high-throughput sequencing data enables all of us to push the boundaries of precision medicine. The BROAD’s Genome Analysis Toolkit (GATK) is the industry standard software package for variant discovery and genotyping. In this luncheon, experts from the BROAD and Intel will discuss the exciting new capabilities that are coming to GATK, and the impact that this could have on the industry.

1:10 Luncheon Presentation II: Cloud Bursting HPC Workloads: Challenges and Opportunities

Dan Chow, COO/CTO, Silicon Mechanics

Feeling constrained by your HPC cluster? Are there times that you need more capacity or to offload some storage? Bursting to the public cloud offers you an alternative to grow with added flexibility. Dan will share about the benefits our customers have experienced and cover some of the pitfalls to be wary of when evaluating how to implement cloud bursting.

1:40 Session Break


1:50 Chairperson’s Remarks
Christopher Southan, Ph.D., Database Curator, IUPHAR/BPS Guide to PHARMACOLGY, University of Edinburgh

1:55 MSSNG – An Open Science Approach to Facilitate Discovery in Autism

Mathew Pletcher, Vice President & Head, Genomic Discovery, Autism Speaks

Autism Speaks has undertaken an effort, entitled MSSNG, in collaboration with Google and The Hospital for Sick Children to generate whole genome sequence from at least 10,000 individuals from families with autism. This genomic data has be made available along with associated clinical and phenotypic data through multiple interfaces under the principles of open science. MSSNG operates under the principle that best was to ensure the delivery of new discoveries and tools to the autism community is to share this valuable resource as broadly as possible and with as few restrictions as possible.

2:25 An Open Embedded Live Image-Analysis Prototyping Platform

Patrick Oberthuer, Research Associate, Chair, Bioprocess Engineering, Technische Universität Dresden

This talk will discuss the idea of open embedded low-cost hardware platforms like the RaspberryPi and widely used open Image-Analysis Platform ImageJ. This will be completed with live imaging devices. This will fulfill the dream of easily prototyping any All-In-One image-analysis System.

EMC22:55 The Case for Adaptive, Hierarchical Metadata

Stephen Worth, Director, Engineering, EMC

Groups maintaining data repositories at the petabyte-scale are discovering that cataloguing associated metadata is necessary to properly access and analyze data. To be successful they depend on researchers and data curators to provide the user-defined metadata. EMC recently contributed Metalnx to aid researchers with metadata management under iRODS. We will be demonstrating the principles of operation for Metalnx and discuss how adaptive, hierarchical metadata can be applied to research curation.

Microsoft Way3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

4:00 No ELN is an Island

Paul Whitehead, pRED Informatics Center Head, Roche

Research at Roche has an extended bench concept that utilizes external scientists to contribute to internal projects. Externalization requires, inter alia, reduced costs and shortened project life cycles to justify its continued use. Externalized projects should be monitored, directed and recorded using suitable planning, electronic laboratory notebook and collaboration tools, and together with automated data exchange, be done quickly and with high quality. The evaluation, selection, implementation and integration of the cloud-based Dotmatics ELN for Roche Research will be presented.

4:30 Between Open and Closed Antimalarial Drug Discovery: Comparing Data Connectivity Gaps and Disclosure Speed

Christopher Southan, Ph.D., Database Curator, IUPHAR/BPS Guide to PHARMACOLGY, University of Edinburgh

Antimalarial research is the poster child for Open Source Drug Discovery (OSDD). However many leads compounds still have their origins in Traditional Closed Drug Discovery (TCDD) and uncertainty remains as to the differences. To provide an assessment, this work examined 32 recent antimalarial structures in terms of their PubChem connectivity. Of these, 21 had patent matches, only 23 linked to publications and only 21 had BioAssay records. Major data connectivity problems included 1) leads not findable by code name, 2) patents not cited in publications 3) leads not reciprocally linked to Plasmodium protein targets and pathways 4) name-to-structures only being declared years after patent disclosure. These issues will be contrasted with the Sydney University Open Source Malaria approach were open lab books are used to surface structures (e.g. as Google-findable InChIKey) and crowdsourced collaboration data close to real time, thereby shaving years of the discovery phase.

5:00 Selected Poster Presentation: Embracing Ambiguity: Representation of Macromolecules Using the Enhanced Standard HELM 2.0 

Markus Weisser, Ph.D., Managing Director, quattro research GmbH

Introduction HELM, the open standard, enables the representation of many types of complex macromolecules including nucleotides, proteins, antibodies and antibody-drug conjugates including ones containing non-natural elements. Created by Pfizer scientists, the Pistoia Alliance formalized the HELM notation as an open standard in early 2013 and publicly released software tools to the Open Source community. Since its release, HELM has attained widespread adoption and benefited from a growing range of global contributors. While HELM1.1 solves the problem of representing unnatural complex biomolecules, it still assumes that the scientist knows everything about the structure. In practice, however, there are a number of cases in which many structural features of a biomolecule are not known. This confronts scientists with a difficult choice: either pretend they have all the information and guess at a structure, or register a textual description with no structural information into their database. HELM 2.0 offers an extension to the HELM notation that allows a user to capture the available structural information while also identifying what is not known. The Pistoia Alliance partnered with quattro research to develop the toolsets that support this extension to the notation. Results The team implemented 3 major enhancements to the HELM definition and open source codebase: 1. The HELM notation and the HELM toolkit now support the representation of ambiguous macromolecules. 2. A new API allows the HELM toolkit to access different chemical libraries of the user's choice. Two libraries, ChemAxon's Marvin Beans and the Chemistry Development Kit (CDK), are currently available to the user. The chemistry plugin can be easily changed or extended to add support for additional chemical libraries. 3. Web-services for the toolkit abstract the toolkit functionality from the code implementation. Thereby, the toolkit can also serve as a client allowing the user to integrate monomer databases. Discussion With the addition of ambiguity support, HELM 2.0 now provides researchers the unique capability of representing complex biological entities that have not yet been fully characterized at the structural level, rendering it an even more practical technology for the electronic representation of a wide array of biomolecules. By additionally enabling the use of different chemical libraries and providing web-services, HELM is now more open and practical technology than ever before. The HELM code is available on GitHub and uses the permissive MIT open source license, which gives anyone the right to freely download and customize it. Please visit for additional information about the project.

5:30 – 6:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 7

7:00 am Registration and Morning Coffee


Click here for detailed information.

10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced


10:30 Chairperson’s Opening Remarks

Narges Bani Asadi, Founder and CEO Bina Technologies Inc., Roche Sequencing

10:40 Selected Poster Presentation: Nephele: A Cloud-Based Scientific Computing Platform for Improved Efficiency, Standardization, and Collaboration in Microbiome Data Analysis 

Ian Misner, Ph.D., Computational Genomics Specialist and Contractor, Bioinformatics and Computational Biosciences Branch (BCBB), NIH/NIAD/OD/OSMO/OCICB

Nephele is a cloud computing platform for microbiome research, aimed at providing scientists with a consistent, centralized, and collaborative environment for high-throughput metagenomics data analysis. Growing evidence supports a critical role for microbiota in human health and disease; 16S and whole genome sequence (WGS) analyses are critical to understanding this relationship, but analysis of large and complex datasets requires advanced computing infrastructure and sophisticated software that are inaccessible to many researchers. Nephele bridges this gap as an all-in-one portal to essential microbiome data and tools. The user-friendly, web-based interface directly links commonly-used metagenomics applications (QIIME, mothur, BioBakery) to the Amazon Web Services (AWS) cloud. Preconfigured (but also customizable) pipelines lower the knowledge barrier for users who may be less familiar with command-line applications. Researchers can also seamlessly integrate public datasets into their analyses, including Human Microbiome Project (HMP) data, to compare against their own experimental sequence data. The on-demand, pay-per-use nature of cloud computing spares valuable funding and administrative resources for individual investigators and institutions by significantly reducing the cost of infrastructure procurement and maintenance. Open-source access to these tools and datasets encourages standardization of tools and methods, reproducibility of results, and extension of capabilities by the research community. By supporting greater adoption and improved efficiency in microbiome research, resources like Nephele can facilitate new discoveries that have the potential to transform medicine.

11:10 Tackling Life Sciences R&D Informatics Challenges through Cross-Industry Pre-Competitive Collaboration Projects at The Pistoia Alliance

Carmen Nitsche, Executive Director, Business Development North America, Pistoia Alliance

Market pressures are driving the Life Sciences industry to embrace pre-competitive collaboration in some aspects of their R&D processes. We will examine several areas that lend themselves to such efforts and review ongoing projects that address common challenges.

11:40 Selected Poster Presentation: Projections Meta Filesystem - Novel Approach for Distributed Data Access and Annotation

Anton Bragin, Ph.D., Systems Architect, Bioinformatics Institute

Nowadays bioinformatics data may exist in different forms such as text and binary files, SQL and NoSQL database records, data objects behind common or vendor-specific application programming interfaces (APIs). To make one data source talk to another or enable data consumption by some software tool the researcher should translate data requests by directly converting data (e.g., by dumping database records to flat files) or by implementing some data integration logic via scripting which is slow, error-prone and often requires extra local storage. To conquer the problems described we developed Projections meta filesystem aimed to provide uniform file-based access to heterogeneous resources and decouple logical resource representation from physical data storage. Projections system uses universal text format for description of logical data structure and set of resource-specific drivers that project actual data objects from some local or remote resource on local FUSE-mounted filesystem. That enables uniform view of data and provides transfer upon request capability. Important feature of Projections system is that metadata is first-class citizen enabling versatile metadata descriptions exceeding traditional tags and key-value properties and providing flexible search capabilities. Typical Projections usage scenarios include file access to non-file objects; using metadata for search and annotation; data analysis upon request: Projection provide logical representation of resource including its metadata that can be searched and analyzed, while data transfer is typically suspended until the data in actually needed; exchange of data resource representations and selected data objects by the mean of prototype files, which are small text files that can be easily edited and transferred. Projections is open-source software based on Filesystem in Userspace (FUSE) and can be used on any modern Linux machine. Currently the system is equipped with drivers for making data projections from NCBI SRA, Genbank, Amazon S3, local filesystem, ThermoFisher Torrent Suite Illumina MiSeq/HiSeq Control Software and can be readily expanded. We hope that Projections meta filesystem will promote data consolidation and make data access, exchange and usage patterns more uniform, metadata-driven and reliable.

12:10 pm Session Break

12:20 Luncheon Presentation I: Innovation through Collaboration: Cultural and Technological Advancements Empowered in the Pediatric Research Arena

Adam Resnick, Director, Children's Brain Tumor Tissue Consortium Division, Neurosurgery Children's Hospital, Philadelphia

The Children's Hospital of Philadelphia has partnered with academic institutions, clinical trial consortia and industry partners to build a new pediatric biospecimen and informatics platform that defines an open-access data discovery ecosystem. These new open-source tools and workflows support “big-data” innovation and define an alternative, sustainable model for collaborative data-driven discovery, in which researchers “compete” to share, connect, and integrate data on behalf of patients.

12:50 Luncheon Presentation II (Sponsorship Opportunity Available) or Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing


1:55 Chairperson’s Remarks

Samantha A. Schrier Vergano, M.D., FAAP, FACMG, Division Director, Medical Genetics and Metabolism, Children’s Hospital of the King’s Daughters

2:00 What If Your Biology Holds the Key that Protects Others from Disease? Changing the Discourse around Sharing Health Data

Jason Bobe, Associate Professor, Director, Sharing Lab, Icahn Institute for Genomics and Multiscale Biology, Mount Sinai School of Medicine; Executive Director,

The protection of personal health and medical data has been recognized as an important goal for decades. The societal value of sharing data is immense, but to date paid much less attention. Designing a biomedical research enterprise that provides individuals access to their own data and improved options for sharing is paramount for addressing critical social concerns like better health, new therapies and disease prevention strategies.

2:30 Community-Driven Approaches to Support Variant Interpretation

Steven Harrison, Ph.D., Variant Scientist, Laboratory for Molecular Medicine, Partners HealthCare Personalized Medicine; Harvard Medical School

Improving our knowledge of genomic variation requires a massive effort in data sharing. Community-driven groups are working to incorporate shared data into variant assessment processes by guiding gene and disease specifications to the ACMG Interpreting Sequence Variant Guidelines, developing variant curation applications, aggregating shared data to inform the community of discrepancies and concordance in variant interpretations, and developing resources to facilitate data sharing.

3:00 Military Health Care Dilemmas and Genetic Discrimination: A Cautionary Tale of One Family’s Experience with Whole-Exome Sequencing

Samantha A. Schrier Vergano, M.D., FAAP, FACMG, Division Director, Medical Genetics and Metabolism, Children’s Hospital of the King’s Daughters

Whole-exome sequencing (WES) has increased our ability to analyze large parts of the human genome, bringing with it complicated ethical considerations. Secondary findings, results that convey genetic risk in asymptomatic individuals outside the initial indication for testing, can have significant social or legal implications. We discuss these issues in the experience with a family with careers in the U.S military, potentially jeopardizing their employment and privacy.

3:30 Development and Validation of an SNP Panel for Sample Identity Quality Control for Use in a High-Throughput Clinical Genetics Laboratory

Thomas B. Freeman, Senior Data Scientist, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai

In clinical genetic testing, it is absolutely imperative that each patient receives the proper test results. We describe the development, implementation and validation of a sample identity SNP panel run in parallel with the DNA-Seq pipeline for sample identity verification. This workflow is integrated with LIMS and data analysis pipeline to provided automated sample identity quality control.

4:00 Conference Adjourns

Purchase on Demand