Symposium 1: FAIR Data

New Frontiers in Findable, Accessible, Interoperable, and Reusable Data Generation

May 3, 2022 ALL TIMES EDT

As the volume of data being produced by pharma companies, medical centers, and academic organizations continues to rise, the capacity for full utilization of this data is hampered by a series of limitations. Over the past few years, application of the FAIR principles (findable, accessible, interoperable, usable) has emerged as a solution to combat these limitations and increase the value of data sets. However, practical challenges in applying these principles, as well as concerns about data privacy, remain. The FAIR Data symposium will showcase FAIR applications in software, data repositories, and specific data types both open source and private. Case studies will be presented on applications from across different disciplines.

Tuesday, May 3

7:00 am Registration Open (Plaza Level Lobby)



8:25 am

Chairperson's Remarks

Ishwar Chandramouliswaran, Program Director, Office of Data Science Strategy, NIH
8:30 am

FAIR Means Metadata: How Technology for Authoring Better Metadata Can Make Data FAIR

Mark A. Musen, Professor of Medicine & Biomedical Informatics, Center for Biomedical Informatics Research, Stanford University

Researchers expect datasets to be FAIR, and they assert that their data are FAIR even more than they claim that their data are conclusive. But creating FAIR data is hard, and the process depends on authoring metadata that adheres to community standards. CEDAR provides technology that can help investigators to create FAIR data. The platform uses biomedical ontologies stored in the BioPortal repository and a user-managed library of machine-readable reporting guidelines that provide standardized metadata schemas. CEDAR is being used internationally in a range of projects to make FAIR data not just a slogan but an achievable reality.

9:00 am

FAIR Everywhere? Building a Research Data Management Practice for European Life Science

Niklas Blomberg, PhD, Director, ELIXIR Hub, EMBL EBI

Open life science data is critically important for the research community - it is extensively reused, as experimental reference and for novel science. ELIXIR, a European infrastructure connecting 23 countries, is rolling out good data management, a common toolkit, and building an expert network that supports life science projects with FAIR practices - creating a data federation that allows users to extract large, interoperable datasets across national boundaries. In this talk I will describe our common toolkit, our training and capacity building, and the foundation of core data resources with examples on how this supports diverse life science communities.

9:30 am Networking Coffee Break
9:50 am

Extracting Bioactive Chemistry from Documents: FAIR Still Has a Long Way to Go

Christopher Southan, PhD, Competitive Intelligence Analyst, Data Sciences, Medicines Discovery Catapult

The flow of SAR data curated from literature and patents into ~ChEMBL, BindingDB, Guide to Pharmacology, and PubChem is central to drug discovery informatics. While these sources have made 2.8 million structures FAIR, commercial sources indicate a legacy extraction shortfall of 4 million. Another impediment is journal authors assuming that depositing bioactivity results as supplementary files into repositories such as Figshare magically makes it FAIR, despite not being machine-readable. Approaches to ameliorate both these legacy and current FAIR limitations will be outlined.

10:20 am

Continuous and Ubiquitous FAIRness: The Joys and Benefits of Integrating FAIR Principles into All of Your Data, All the Time.

Carl F. Kesselman, Research Professor & Director, Industrial & Systems Engineering, University of Southern California

FAIR principles are good in principle, but how do we get FAIR data in practice. All too often, issues of FAIRness are considered at the end of the data life-cycle, just prior to publication or meeting funder's requirements. In this presentation, we consider the advantages of integrating FAIR principles into the complete lifecycle of data associated with a research investigation.  


Realizing FAIR in Biomedical Sciences

Panel Moderator:
Ishwar Chandramouliswaran, Program Director, Office of Data Science Strategy, NIH

The complexity and volume of basic, translational, and clinical research data generated by biomedical researchers continues to rapidly increase, even more so in the light of the COVID-19 pandemic. This panel of speakers will discuss and showcase approaches to better make these data discoverable, interoperable, and reusable according to FAIR practices, overcome current data science challenges, and realize the vision for a modern biomedical data ecosystem.

Christopher Southan, PhD, Competitive Intelligence Analyst, Data Sciences, Medicines Discovery Catapult
Mark A. Musen, Professor of Medicine & Biomedical Informatics, Center for Biomedical Informatics Research, Stanford University
Niklas Blomberg, PhD, Director, ELIXIR Hub, EMBL EBI
Carl F. Kesselman, Research Professor & Director, Industrial & Systems Engineering, University of Southern California
Can (John) Akgun, Ph.D., Senior Vice President of Business Development, Flywheel

Adoption of FAIR has emerged as a solution to optimize data-driven Life Sciences R&D. Digital transformation initiatives are now relying on FAIR principles to accelerate drug design and clinical trials. However, complex data, such as medical images, present additional challenges that require unique solutions. In this presentation, we will discuss the soup-to-nuts approach of implementing FAIR across large teams to improve R&D efficiency and project outcomes.

11:50 am Enjoy Lunch on Your Own


1:05 pm

Chairperson's Remarks

Benjamin R. Busby, PhD, Director, Solution Science, DNAnexus
1:10 pm


Vivian Neilley, Lead Interoperability Solution Engineer, Google Cloud Healthcare

The US Government and legislative bodies across the globe have started to mandate the use of fast healthcare interoperability resources (FHIR) in clinical systems. These mandates leave a clear gap with the research community, causing organizations to determine their own interoperability path. This presentation will overview why these organizations should consider FHIR data in their FAIR data strategies, the implications for life sciences organizations, and other standards to consider.

1:40 pm

UK Biobank: A Uniquely Powerful Biomedical Database That Can Be Accessed Globally for Public Health Research

Mark Effingham, PhD, Deputy CEO, UK Biobank

With a mission to enable scientific discoveries that improve human health, UK Biobank has become a tremendously significant database for science and medicine and is currently being used by 30,000 researchers in over 90 countries around the world. The resource is unique given the combination of its scale, richness, duration, and accessibility. This session will focus on UK Biobank’s accessibility and the steps taken to ensure this petabyte-scale database is made available to bona fide researchers to improve diagnosis, treatment, and prevention strategies for the most devastating diseases, benefiting millions of people in the UK and around the world.

2:10 pm Networking Refreshment Break
2:30 pm

Making Smart Choices in Biomedical Data Reuse; Enhanced (Re)Usability of Biomedical Data Types after Consolidation into a Schema-Less Database

Benjamin R. Busby, PhD, Director, Solution Science, DNAnexus

After combining data of various data types into a single schema-less database, we find that novel insights can be derived from the data relatively easily by domain experts and that for complex analyses, they can be presented to users easily in an extensible way.  We will show two specific examples, one concerning metagenomic analysis of individuals with colorectal cancer, and another concerning ECG and subsequent analysis from a randomized cohort of individuals from the UKBiobank with cardiac arrhythmias.  We are particularly excited to demonstrate that explainable machine learning results can be loaded alongside primary data, pointing (not directing) clinicians and researchers to data-driven indications.  

3:00 pm

Enhancing Metadata FAIRness through Automated Processing to De-Risk Exploration

Emerson Huitt, CEO, Snthesis, Inc.

Realizing the promise of FAIR principles to enable interoperability between datasets is crucial to supporting novel analyses that drive discovery. FAIRness is usually considered at the end of the research process, and public repositories suffer from considerable variability in metadata quality. Leveraging explainable machine learning to enhance the FAIRness of both existing and new data leads to significant improvements in the ability to use integrated data to answer novel questions. In this talk, approaches to automated integration and specific examples of its impact will be discussed and future impacts will be outlined.

3:30 pm Closing Remarks
3:40 pm Close of FAIR Data Symposium



4:00 pm

Welcome by Conference Organizer

Allison Proffitt, Editorial Director, Bio-IT World
4:05 pm Innovative Practices Awards
Mike Tarselli, PhD, Chief Scientific Officer, TetraScience
4:30 pm

Ask What IT Can Do for Bio...and What Bio Can Do for IT

George M. Church, PhD, Robert Winthrop Professor, Genetics, Harvard Medical School

IT for Bio: In May 2021, one haploid human genome (3.055 billion bp) was sequenced completely, but zero diploid. We have 7.7 billion diploid humans yet to be sequenced and correlated with their environments and traits in the Personal Genome Project. Plus, at least one genome from each of over 8.7 million eukaryotic species in the Earth Biogenome project. Plus, monitoring pathogenic and commensal bacteria, allergens, and viruses in the BioWeatherMap. Plus, ancient DNA. We are counting RNA molecules per cell in most (or all) cell types in humans, mice, and many other species throughout development and connectome (with imaging resolution up to 20 nm).   

Bio for IT: Reading and writing DNA has improved exponentially in cost (at least 60 million fold) and is increasingly used for storing non-biological data. The record for editing DNA in vivo is now 24,000 edits per cell and for storing data in vivo is about 1 terabyte per mouse. Enormous chemical and biological 'libraries' can perform 'Natural Computing' for tasks far beyond current von-Neumann silicon and quantum computers. The combination of these – machine learning + megalibraries (ML-ML) is already having commercial impact (e.g. Nabla, Manifold, Dyno, Patch). 

5:45 pm Welcome Reception in the Exhibit Hall with Poster Viewing (Auditorium/Hall C)
7:00 pm Close of Day

Register Early for Maximum Savings

Modern Data Platforms and Storage Infrastructure