Track 6: FAIR Data

As the volume of data being produced by pharma companies, medical centers, and academic organizations continues to rise, the capacity for fully making use of this data is being hampered by a series of limitations. The FAIR (findable, accessible, interoperable, reusable) principles are a very powerful initiative which has the potential to significantly increase the value of data sets. The FAIR Data track, which complements the Bio-IT Hackathon, brings together data scientists who are pioneering the use of FAIR data, with specific examples of how its use is enhancing the value of the data for specific applications including, but not limited to, omics, imaging, and clinical trial data.

Final Agenda

Tuesday, April 16

7:00 am Workshop Registration Open and Morning Coffee

8:0011:30 Recommended Morning Pre-Conference Workshops*

W5. Managing Sensitive and HIPAA-Controlled Data with Globus

12:304:00 pm Recommended Afternoon Pre-Conference Workshops*

W14. The Gene Pattern Notebook Environment for Open Science and Reproducible Bioinformatics Research

* Separate registration required.

2:006:30 Main Conference Registration Open


5:007:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, April 17

7:30 am Registration Open and Morning Coffee


9:45 Coffee Break in the Exhibit Hall with Poster Viewing

Waterfront 2

10:50 Chairperson’s Remarks

Anne Deslattes Mays, PhD, Principal Computational Scientist, The Jackson Laboratory for Genomic Medicine

11:00 Community Convergence onto an Internet of FAIR Data and Services

Erik Schultes, PhD, International Science Coordinator, GO FAIR International Support and Coordination

11:30 FAIRness and Accountability: Expectations vs. Reality

Helena Deus, PhD, Technology Research Director, Elsevier Labs, Elsevier, Inc.

For many scientists, the prose and charts on a scientific article constitute the proper way to report scientific data. From a data scientist’s perspective, there is an expectation that scientific data is findable, available, interoperable and reproducible. There is a gap between expectations and reality. In this talk, we will dive deeper into that gap and explore how and why that gap is wider or narrower in different scientific domains.

Databiology 12:00 FAIR Data + FAIR Apps = New Frontier in Deriving Meaning and Insights

Juan Caballero, PhD, CSO, Databiology

There has been significant progress in applying FAIR data principles to the data available today. Having FAIR data is a great start, but what if the rest of your research process isn’t FAIR? Learn about how to make apps FAIR and how this enables self-describing analysis, opens new frontiers for intelligent analysis processes and automatic capturing of research insights

12:30 Enjoy Lunch on your Own (Lunch Available for Purchase in the Exhibit Hall)

Waterfront 2

1:50 Chairperson’s Remarks

Helena Deus, PhD, Technology Research Director, Elsevier Labs, Elsevier, Inc.

1:55 KEYNOTE PRESENTATION: From Finding to Exploiting FAIR Data

Mathew Woodwark, PhD, Director, Research Bioinformatics, Data Science and AI, BioPharmaceuticals R&D, AstraZeneca

Much effort with the FAIR data community has focused on dataset catalogues, a prerequisite towards Finding FAIR data. But once discovered, what can we do with FAIR datasets? At AZ/Medimmune we are building an internal and external FAIR data ecosystem that can provide a loosely-coupled knowledge graph; an environment for initiating complex analytics and recording key insights; a human and machine agnostic data science playground for meaningful collaboration and knowledge extension to drive novel discovery and informed decision making. Both experts and novices should participate and derive value from the FAIR ecosystem.

2:25 CO-PRESENTATION: Building a Unified Data Model

Roman Affentranger, PhD, Head, Small Molecule Discovery Workflows, Roche Pharma Research and Early Development Informatics, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd.

Carmen I. Nitsche, Business Development Consultant, Pistoia Alliance

The Unified Data Model (UDM) is a common effort of vendors and life science organizations to create a well-defined data format for exchange of information about compound synthesis and their biological testing. Run under the umbrella of the Pistoia Alliance, the project released the first open and publicly available version of the Unified Data Model (UDM) in June 2018 and this saw a significant step in the ability to store and exchange information about compound synthesis and their biological testing. Without this common language and structure to describe experiments, data integration has been limited and a significant part of published data has not been readily available for processing or analysis. The first public release of UDM was closely followed by an enhanced version 5.0 Brooklyn, including numerous extensions and it is expected that the model will continue to be improved as demand dictates working with the Pistoia FAIR data implementation by industry community.

2:55 Building an Enterprise Data Lake that is FAIR

Irene Pak, Lead R&D Data Architect, Information and Data Management, Bristol-Myers Squibb

As with many companies, Bristol-Myers Squibb has embarked on its journey to implement an enterprise data lake as one of the means to reach data nirvana, a state where human and machine can effectively mine our disparate digital data assets and turn them into business insights that will ultimately help our patients.  The FAIR data principles play an important role in our undertaking by providing a framework to make our data findable, accessible, interoperable and reusable.  In this presentation, I will share some of our learnings in the pursuit of FAIRness for our complex data ecosystem.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing, Meet the Experts: Bio-IT World Editorial Team, and Book Signing with Joseph Kvedar, MD, Author, The Internet of Healthy Things℠ (Book will be available for purchase onsite) 

Waterfront 2

4:00 Introduction to FAIR Data Hackathon

Ben Busby, PhD, Scientific Lead, NCBI Hackathons Group, National Center for Biotechnology Information (NCBI)

Over the past two days, teams have been working hard to evaluate and improve the FAIRness of data sets, tools, and pipelines. Before the teams present their results NIH data hackathon coordinator, Ben Busby, will introduce us to FAIR data hackathons and what our teams have been up to for the past 2 days.

4:15 Hackathon Report Outs

During the FAIR Data Hackathon, which will take place April 15-16, project teams will work on various data sets with two goals in mind. The first task is to evaluate the FAIRness of a given data set, comparing it to FAIR principles. Next they will work on various modifications of the data set that would improve the FAIRness of the information. Representatives from the project teams will report out on the work that was done, and lessons learned from the hackathon.

For information on the Bio-IT World FAIR Data Hackathon and how you can get involved click here!

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 18

7:30 am Registration Open and Morning Coffee


9:45 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

Waterfront 2

10:30 Chairperson's Remarks

Les Mara, Founder, Databiology

10:40 Powering AI Innovation with the Emerging Internet of FAIR Data and Services

Michel Dumontier, PhD, Distinguished Professor, Institute of Data Science, Maastricht University

The rapid and worldwide endorsement and adoption of the FAIR (Findable, Accessible, Interoperable, Reusable) principles is contributing to the establishment of a new Internet of FAIR Data and Services. This emerging network of knowledge and services offers new opportunities to power AI-based applications in a manner that is both scalable and responsible.

11:10 Making Fair Play NICE with Information Systems

Eric Neumann, PhD, Founder & CEO, AIDAKA LLC

Acceptance and implementation are critical for FAIR to attain its objectives. This means that existing systems, both academic and industrial, should be able to adopt and benefit from FAIR principles without the need to perform expensive, time-consuming, complex system integration or re-development. In order to do this, an additional set of principles is recommended here, with the associated label NICE: Noninvasive, Integrable, Cost-Effective, and Extensible. Their utility and implementation will be discussed in this talk.

11:40 Fostering Autonomous and Inclusive Research

Brian M. Bot, Principal Scientist, Outreach and Strategic Development, Sage Bionetworks

The emergence of digital platforms and the ubiquity of smart devices are providing biomedical researchers with an unprecedented ability to capture fine-grained information on research participants. But just because we have the ability to collect these data, does that mean we should? And if we do collect them, how should they be shared and/or governed? We will discuss emerging opportunities and risks in an increasingly decentralized biomedical research ecosystem.

12:10 pm Enjoy Lunch on Your Own (Lunch Available for Purchase in the Exhibit Hall)

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

Waterfront 2

1:55 Chairperson’s Remarks

Eric Neumann, PhD, Founder & CEO, AIDAKA LLC

2:00 Using the BioCompute Framework for FAIR Communication of Data Provenance

Elaine Thompson, PhD, Senior Staff Fellow, CBER, FDA

The BioCompute Framework is designed for communicating high-throughput sequencing (HTS) computations and analyses. It provides a FAIR method for exchanging data provenance, workflows, or pipelines in a JSON structure. BioCompute Objects can facilitate communication between scientists, researchers, and regulatory agencies including FDA. BioCompute is described in a Standard Trial Use document available at Open Science Foundation (OSF) and will be published by IEEE.

2:30 FAIR as a Working Principle for Cancer Genomic Data

Ian Fore, PhD, Senior Biomedical Informatics Program Manager, Center for Biomedical Informatics and Information Technology, National Cancer Institute

Working with the National Center for Biotechnology Information the National Cancer Institute uses the Sequence Data Delivery Pilot (SDDP) to store data “as is” from multiple cancer genomic studies. Short of the full harmonization employed in its Genomic Data Commons the NCI uses the FAIR principles as a yardstick for making SDDP data usable by the cancer genomics community.

3:00 Data Stewardship for Single-Cell Genomics

Eric Weitz, Senior Software Engineer, Data Sciences Platform, Broad Institute of MIT and Harvard

Single-cell genomics assays can measure gene expression in thousands of cells, simultaneously. In order for such data to be useful for biomedical researchers and developers, it must be findable, accessible, interoperable, and reusable. This presentation will discuss implementation of those FAIR principles for single-cell genomics in the Human Cell Atlas, the Data Sciences Platform at Broad Institute, and the Single Cell Portal.

3:30 A Journey through the Roche Data Commons Architecture

Tom Quaiser, PhD, Data Science, Roche Pharma Research and Early Development Informatics, Roche Innovation Center Munich

Over the last couple of years, we at early research in Roche have been building up a new data architecture - the Roche Data Commons. Main ingredient to this architecture is modularity and FAIRness of data. In this talk we will take you on a journey through the components of the Roche Data Commons and demonstrate the benefits of such an architecture.


4:00 Conference Adjourns


Platinum Sponsors