Bio-IT FAIR Data Hackathon

May 18 – 19, 2026, in Boston, MA

The Bio-IT World Hackathon is a cornerstone of the Bio-IT World Conference & Expo, bringing together data scientists, software developers, and life science professionals to tackle real-world data challenges. Focused on Open Source and FAIR (Findable, Accessible, Interoperable, Reusable) principles, this two-day event fosters innovation and collaboration to deliver practical solutions.

Hackathon Sponsor:

CFDE

What to Expect in 2026:

The 2026 hackathon will continue to unite life science and technical professionals to address pressing data challenges using Open Source and FAIR Data approaches. Facilitated by leaders from the NIH Common Fund Data Ecosystem (CFDE), this year’s event will focus on projects leveraging omics data and integrating CFDE tools, improving interoperability across datasets to accelerate discoveries.

The CFDE ensures Common Fund data is accessible and reusable, providing researchers with a centralized online platform for integrating multiple resources seamlessly while enabling new insights and scalable solutions.

Why Participate?

Compete for Prizes and Recognition – Multiple prizes of up to $5,000 will be awarded to top teams for innovative, high-impact solutions that advance open science and biomedical data reuse.
Solve Real-World Challenges – Address critical data problems using Open Source and FAIR principles.
Collaborate with Experts – Partner with peers of various backgrounds to develop workflows, tools, and analysis pipelines that advance biomedical discovery.
Gain Hands-On Experience – Work with cutting-edge technologies in bioinformatics, AI, and cloud-based data analysis.

The hackathon is free and in-person only. Registration to the main conference is not required.

How to Get Involved:

Want to Join a Team? Deadline: April 17
Complete this form to tell us a little bit about yourself, and we will follow up with you regarding the status.

Complete Form

Project Descriptions:

Project 1: Exercise as Medicine: Opposing Molecular Signatures in Health and Disease
Exercise is known to influence many biological pathways, but how do those changes compare directly to what happens in disease? In this project, you’ll analyze multi-omics data from the MoTrPAC endurance training study and compare it with publicly available rat disease models such as diabetes, hypertension, and cancer. The goal is to identify genes, proteins, and metabolites that shift in opposite directions in exercise versus disease contexts. Participants will work on harmonizing datasets across tissues and platforms and generating a ranked list of candidate “exercise–disease contrast genes.” The team will also explore how these findings can be contextualized using the CFDE Knowledge Center.

Project 2: Drug Repurposing via Disease Similarity and Biomarker Networks

Many diseases share molecular biomarkers, which may point to shared biological mechanisms. This project integrates BiomarkerKB with CFDE datasets such as LINCS, IDG, and GTEx to explore disease similarity through network-based approaches. Participants will construct and analyze a biomarker–drug knowledge graph to cluster diseases and identify potential repurposing candidates supported by multiple data sources. The work will involve gene identifier harmonization, network analysis, and interpretation of drug–gene perturbation data. The outcome will be a reusable framework and visualizations that illustrate disease–drug relationships.

Project 3: Visible Neural Networks for Cancer Drug Response (CellMap-VNN)

Predicting cancer drug response from molecular data remains a challenging problem. In this project, participants will develop interpretable Visible Neural Networks that incorporate hierarchical cellular information such as protein–protein interactions and perturbation data. By grounding the model architecture in known biological structure, the team will explore whether this approach improves interpretability compared to conventional models. Models will be trained using publicly available tumor profiling and drug response datasets. Participants will also implement reproducible workflows and basic MLOps practices to support FAIR principles.

Project 4: Illuminating the Molecular Basis of Drug Side Effects

Drug side effects often reflect underlying biological mechanisms that are not fully understood. This project uses DrugCentral and Pharos to identify drugs that share a side effect and may act on a common target. The team will evaluate whether the target is expressed in relevant tissues and examine pathway databases such as Reactome to explore plausible mechanistic explanations. The focus will be on building a structured, reproducible pipeline that connects side effect data to molecular targets and pathways. Results may highlight hypotheses for further investigation or drug repurposing.

Project 5: Mining the CFDE for Post-Translational Modification Data

Post-translational modifications (PTMs) play a central role in regulating protein function and are frequently implicated in disease processes. In this project, participants will identify PTM-relevant datasets within the CFDE and focus on analyzing connections among modified genes, associated diseases, and affected biological pathways. The team will work to link PTM-containing datasets to gene-level annotations and map these genes to known disease associations and pathway resources. Emphasis will be placed on developing strategies for integrating and harmonizing PTM information across studies to support comparative analysis. The goal is to produce structured outputs and exploratory analyses that highlight gene–PTM–disease relationships and suggest avenues for further investigation.

Project 6: Exercise Mimetics Discovery with Gene Set Foundation Models

Exercise produces consistent molecular signatures across tissues, raising the question of whether existing compounds produce similar patterns. This project will compare MoTrPAC exercise gene signatures with LINCS drug perturbation data to identify compounds with overlapping molecular effects. In addition to standard enrichment approaches, the team will explore Gene Set Foundation Models (GSFMs) to examine relationships among candidate compounds. Results may be further annotated using GTEx aging signatures and IDG resources. Deliverables include a ranked list of candidate compounds and a reproducible analysis workflow.

For more details on the Hackathon, please contact:
The CFDE Training Center at cfde-trainingcenter@orau.org

Conference Tracks

T1: Data Platforms & Storage Infrastructure