2018 Archive | Track 3- FAIR Data for Genomic Applications

2018 Archived Content

OVERVIEW | DOWNLOAD BROCHURE | SPEAKERS | WORKSHOPS

Track 3: FAIR Data for Genomic Applications

The volume of life science data, particularly genomic data, continues to rise exponentially, but the capacity for fully making use of this data is being hampered by a series of limitations. FAIR is a very powerful initiative that has taken root primarily in Europe, but which has the potential to significantly increase the value of genomic data sets. One of the keys to making data more findable, accessible, interoperable and reusable is to make use of unique, permanent and universally accepted identifiers and metadata, which then translates into the ability to link different data sets semantically and draw useful inferences and learnings. This event, which complements the Bio-IT Hackathon, which launched last year, brings together academic, government and commercial end-users who are pioneering the use of FAIR data, with specific examples of how its use is enhancing the value of the data for specific applications. Applications in research, translational drug development and clinical results will be covered.

Tuesday, May 15

7:00 am Workshop Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 – 11:30 Recommended Morning Pre-Conference Workshops*

W5. Data Visualization to Accelerate Biological Discovery

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*

W11. Data Science Driving Better Informed Decisions

* Separate registration required.

2:00 – 6:30 Main Conference Registration Open (Commonwealth Hall)

4:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

Wednesday, May 16

7:00 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 PLENARY KEYNOTE SESSION (Amphitheater & Harborview 2)

9:45 Coffee Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

OVERVIEW OF FAIR PRINCIPLES
Waterfront 3

10:50 Chairperson’s Remarks

Barend Mons, PhD, Dutch Techcentre for Life Sciences

11:00 FAIR in the Context of Data Stewardship and the FAIRification Process

Erik Schultes, PhD, Dutch Techcentre of Life Sciences

The emerging field of Data Stewardship attempts systematize the constellation of (seemingly unrelated) housekeeping duties associated with data creation and reuse. The FAIR Principles offer guidance in achieving good Data Stewardship that is generic to particular data types and knowledge domains. I will explain the original intent and rationale behind the 15 FAIR Principles, by placing them in the larger context of the goals of Data Stewardship.

11:30 Toward Metrics to Access and Encourage FAIRness

Michel Dumontier, PhD, Department of Data Science, Maastricht University

While the FAIR (Findable, Accessible, Interoperable and Reusable) principles have enjoyed rapid adoption, questions remain as to what it means to be FAIR and how to assess FAIRness. I will discuss recent developments to establish robust infrastructure to facilitate the assessment of the FAIRness of different digital resources.

12:00 pm Keynote Presentation: Update on Developments for FAIR and the GO FAIR Initiatives

Barend Mons, PhD, Dutch Techcentre for Life Sciences

The FAIR Data Principles were conceived in a multi-stakeholder workshop held at Leiden University in January 2014. Since this time, the Principles have been widely adopted by both public and private organizations with interests in both Open and Closed data. I will cover the these developments with particular focus on the GO FAIR initiative, a rapid implementation framework supporting the development of the internet of FAIR data and services.

12:30 Session Break

12:40 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:40 Session Break

TOOLS FOR FAIR DATA
Waterfront 3

1:50 Chairperson’s Remarks

Tom Plasterer, PhD, U.S. Cross-Science Director, R&D Informatics, AstraZeneca Pharmaceuticals, Inc.

1:55 Datasets2Notebooks: Systematic Interactive Reports to Enrich Biomedical Data Repositories

Avi Ma’ayan, PhD, Professor, Department of Pharmacological Sciences and Director, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai

To make biomedical datasets Findable, Accessible, Interoperable and Reusable, Jupyter Notebooks provide an advent mean for making bioinformatics data analyses transparent and accessible to both computational experts and novices. In a pilot project called Datasets2Notebooks, we developed a Jupyter Notebook generator. The notebook generator allows users to easily create, store and deploy live Jupyter Notebooks containing analyses of large biomedical datasets on a cloud-based infrastructure. Through an intuitive web interface, novice users can rapidly generate tailored reports to analyze their own data, or data from the public domain. Notebooks are findable through a web portal, and made accessible from a dedicated URL. Furthermore, by relying on Docker and Google’s Kubernetes as the cloud platform for notebook deployment, the reusability of such digital objects is guaranteed. By combining an intuitive user interface for notebook generation with the option to provide custom analysis scripts, Datasets2Notebooks addresses computational needs of both experimentalists and computational biologists.

2:25 Dataset Catalogs as a Foundation for FAIR Data

Tom Plasterer, PhD, U.S. Cross-Science Director, R&D Informatics, AstraZeneca Pharmaceuticals, Inc.

BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.

2:55 Is Your Life Science Data “F-A-I-R”?

Paul Honrud, Founder, DataFrameworks, Dell EMC

Sasha Paegle, Senior, Business Development Manager, Life Science & HPC, Dell EMC

Dataframeworks and Dell EMC will discuss how a life science organization can ensure its data is “F-A-I-R” using Dataframeworks ClarityNow along with Dell EMC Isilon (file) and ECS (object) and other storage technologies.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

4:00 Report-Out from Three Hackathon Projects

Over the past two days project teams have worked on various data sets with two goals in mind. The first task was to evaluate the FAIRness of a given data set, comparing it to FAIR principles. The second task was then to work on various modifications of the data set that would improve the FAIRness of the information. Representatives from three of the project teams have been selected to report out on the work that was done, and lessons learned from the hackathon.

4:30 FAIRification of the Pistoia Ontology Mapping Project

Ian Harrow, PhD, Project Lead for Pistoia Ontology Project

Ontologies and mappings between them underpin the successful application of linked data and semantic technologies. The Pistoia Ontologies Mapping project was established to make better use of existing ontologies through better mapping tools and services. In this presentation I will describe how the FAIR principles have been implemented in this project 1) to share guidelines for assessment of ontologies, 2) to evaluate and identify the top performing ontology mapping tools and 3) to implement a prototype Ontologies Mapping service, working with EMBL-EBI.

5:00 CO-PRESENTATION: Easier Integration and Enrichment of Your Data by Making Public Data More FAIR

Hans Constandt, CEO, ONTOFORCE

Chris Evelo, Maastricht University and ELIXIR

Public data has different levels of FAIRness. The higher the FAIRness level of a data source, the easier it is to use this source for data integration and linking. One of the goals of the intergovernmental organization ELIXIR is to facilitate the improvement of finding and sharing data and exchange of expertise in life science. ONTOFORCE focusses on integrating and linking public and private data by - in general - bringing data to a higher level of FAIRness. In this joint presentation, we will discuss what ELIXIR is doing to make public data more FAIR and combine this with showing examples of what the direct benefits are for data searching, browsing and visual analytics on the DISQOVER platform by making and using more FAIR internal, private or third party data.

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

7:00 – 10:00 Bio-IT World After Hours @Lawn on D
**Conference Registration Required. Please bring your conference badge, wristband, and photo ID for entry.

Thursday, May 17

7:30 am Registration Open (Commonwealth Hall) and Morning Coffee (Foyer)

8:00 PLENARY KEYNOTE SESSION & AWARDS PROGRAM (Amphitheater & Harborview 2)

9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced (Commonwealth Hall)

PROMOTING WIDER USE OF FAIR DATA
Waterfront 3

10:30 Chairperson’s Remarks

Helena Deus, PhD, Director, Disruptive Technologies, Elsevier, Inc.

10:40 Setting a Course for FAIRness in Scientific Research

Helena Deus, PhD, Director, Disruptive Technologies, Elsevier, Inc.

About 73% of researchers agree that sharing data is important. However, as many as 34% of researchers admit to not share their data at all. So where is the gap? Many (38% of those surveyed) believe that there is no credit attached to sharing data. As scientific biomedical data increases in both size and complexity, the lack of incentives and training needed to properly share experimental results is likely responsible for widening this gap. Some of the strategies that are in use - at Elsevier and elsewhere - to close the gap, will be discussed, with a special emphasis on genomic information and sharing sequencing results.

11:10 Taking the Load Off Supplementary Data

Myles Axton, PhD, Chief Editor, Nature Genetics

Research publications have become increasingly complex with densely linked appendices of figures and tables. Some publishers now refuse to consider supplementary files. In contrast, data tables and figures hosted under CC-BY license can better support data reuse than copyrighted supplementary files. We advocate Figshare and similar archives to achieve the articulation of article and datasets, and are experimenting with more semantic ways to link these files to conventional subscription publications. Open licensing and separate hosting enable the publisher to offer standardized metadata and enable third party modeling using standard languages and ontologies. Dataset combinations then lead to new Analysis publications.

11:40 Making Data More FAIR on the Cloud

Geraldine Van der Auwera, Associate Director, Outreach and Communications, GATK, Broad Institute

The Broad Institute’s Data Sciences Platform builds and operates cloud-based platforms for sharing and analyzing data at scale. This talk will cover how the free, secure and open-source FireCloud platform makes data (1) more Findable through an ontologically searchable Data Library, (2) more Accessible through Google's identity and access management, as well as REST APIs for programmatic access, (3) more Interoperable through the use of standardized analysis pipelines implemented as portable WDL workflows, and (4) more Reusable through hosting of commonly used resources, convenient and powerful sharing options, and application of a standardized Consent Ontology.

12:10 pm Session Break

12:20 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing (Commonwealth Hall)

APPLICATIONS OF FAIR DATA
Waterfront 3

1:55 Chairperson’s Remarks

Tom Quaiser, PhD, Roche Pharmaceuticals

2:00 FAIR Data: Single Points of Truth and Integration of Omics Data

Tom Quaiser, PhD, Roche Pharmaceuticals

In early development and research (pRED) at Roche we are step-wise realizing a new way to build up a sustainable research informatics landscape, the pRED Data Commons. In this talk I am presenting how we organize different data types and references information in a lean and agile way into Single Points of Truths (SPoT), and subsequently combine these to answer omics related questions. Ultimately, this will make data findable, accessible, interoparable and reusable or simply FAIR.

2:30 Understanding the Variety of Biomedical Data: A Driver for FAIR Practices

Ian Fore, PhD, Senior Biomedical Informatics Program Manager, Cancer Informatics, National Cancer Institute

Standardization of biomedical data is a best practice, but the diversity of disciplines and the pace of new discovery are fundamental characteristics that must be accommodated. No one person can grasp it all but standard ways to describe data across disciplines can enable the just in time understanding of data relevant to a biomedical research problem. Approaches to data discovery and indexing explored through NIH programs and as part of various data commons will be described. These include efforts to define a minimal metadata model to support data discovery use cases, and exploration of technologies such as schema.org to define datasets at source.

3:00 FAIRification Case Studies: Lessons Learned from Implementing the FAIR Principles in Pharmaceutical Companies, Hospitals and Biobanks

Kees van Bochove, CEO, The Hyve

Implementation of the FAIR Data Principles is a crucial step for all organizations pursuing a (biomedical) data-driven strategy, both to improve the effectiveness of scientists and doctors as well as computerized aides and autonomous programs. This talk will provide a number of concrete examples of how various customers of The Hyve, including large pharma companies, biobanks and registries and national health data sharing initiatives, have employed data FAIRification strategies to improve the (re)usability of their healthcare and biology data, and of the open source software tools and standards that are used and being further developed for that purpose.

3:30 Interactive Data Selection with tranSMART Glowing Bear in the Dutch National Health Research Infrastructure

Jan-Willem Boiten, PhD, Program Manager, Lygature

Health-RI (https://health-ri.org/) is being shaped into the Dutch national Research Infrastructure for personalised medicine & health research facilitating the research process from start to end. Health-RI unites existing research infrastructure initiatives and solutions to one integrated FAIR data platform, largely based on open source tooling. A core component in this research infrastructure is the open-source data integration platform tranSMART, which has seen huge adoption within pharmaceutical companies, public-private partnerships as well as in hospitals and clinical institutes for its powerful data integration and exploration toolset targeted at the needs of translational scientists. However, limitations in data modelling, scalability and a dated user interface have held user satisfaction back. This presentation will showcase how Health-RI has overcome these obstacles leveraging data standards, the latest tranSMART Server version and the new modern and intuitive tranSMART user interface Glowing Bear.

4:00 Conference Adjourns

Conference Tracks

T1: Data Platforms & Storage Infrastructure