Bio-IT World is proud to bring together innovative data scientists and developers from across the industry to solve real-world data challenges using the principles of Open Source & FAIR Data. 

Over the years, the Bio-IT World Hackathon has delivered a new level of collaboration to the annual Bio-IT World Conference & Expo in Boston. Our 2019 FAIR Data Hackathon, co-facilitated with leaders from the NIH, brought together a record breaking 8 teams and over 100 participants to solve real data challenges. Learn more about the 2022 Hackathon at Bio-IT World

Back again in 2022, the fourth annual Bio-IT Hackathon will continue in the tradition of uniting life science and IT teams to tackle actual bioinformatics projects with maximum impact potential. 

Projects at the event will feature either Open Source tools or some or all aspects of making data findable, accessible, interoperable, and reusable. All projects will be broadly applicable to the data science community. 

Online Registration for the Bio-IT World Hackathon has closed.
If you are still interested in registering please email Kaitlyn Barago at kbarago@healthtech.com

2022 Projects
Learn more about our 2022 Hackathon projects

Iterative Cluster Analysis Using Multi-Omics Modalities Leveraging Open Data Available from the Gabriella Miller Kids First Data Resource and the INCLUDE Data Hub
We seek to form two collaborative teams who will create platform agnostic iterative cluster analysis workflows and notebooks using machine learning to answer previously unanswered health questions with multi-omics data.  One team will focus on unraveling correlative and, to the extent possible, causal origins of oncogenesis and/or tumor suppression using open childhood cancer data derived from Kids First Data Resource (https://portal.kidsfirstdrc.org/). The other team will focus on unraveling the immune system response in light of co-occurring conditions in individuals with Down syndrome using the open data derived from INCLUDE Data Hub (https://portal.includedcc.org/).  Open access data matrices based on genomics, gene expression, alternative splicing events, protein expression, and/or metabolomics will be prepared in advance to facilitate clustering analysis and workflow development.

Creating Computable Knowledge from Unstructured Information
Virtually all clinical and medical knowledge is contained in rich free text information generated from research papers, physician notes in electronic medical records, lab notebooks and other areas that span across health-care and life sciences applications. Extracting and structuring information from this language, or using it directly as part of analytics pipelines, is a generational challenge that modern natural language processing (NLP) is just beginning to address. The uniqueness of clinical speech and text, however, necessitates domain-specific model architectures. NVIDIA has invested heavily in providing the tools needed to address this challenge and worked with our partners in industry and health systems to demonstrate the value and promise of these models across a huge range of NLP-enabled tasks. Using OSS technologies and models from NVIDIA, this project will create computable knowledge from unstructured information using an end-to-end pipeline and pretrained biomedical models. Specifically, we will leverage the recent LitCoin challenge sponsored NCATS to build an end-to-end NER to Entity Linking pipeline in NVIDIA NeMo, and extend the performance and functionality of the pipeline during the hackathon. This work is applicable in various domains like drug target identification, prioritization and repurposing, prior art exploration, clinical trials analysis and adverse event detection.


NCATS Biomedical Data Translator
The NCATS Biomedical Data Translator is a federated set of tools that work together to integrate data from across the domains of biomedical knowledge–from basic science to clinical records–and that are accessible using a shared open standard for querying and message passing. In this project we will share these data and tools as well as assist others in integrating their own data into Translator, thereby assessing the interoperability of our system.

Visualization of NCBI ALFA Variants
NCBI ALFA aims to provide genetic variation and allele frequency from more than 1 million subjects in the Database of Genotypes and Phenotypes. ALFA is a large population genomic variant dataset but its sheer scale and richness can be challenging or unwieldy for users. Developing new approaches for complex analysis, navigation, and visualization of ALFA can maximize usage of the ALFA data by the community for variant interpretation for clinical and research use. The primary goal of this project is to develop a novel tool app, tool, or approach to navigation and visualization of NCBI ALFA variants and allele frequency for 12 different human populations. 

Hackathon Platform Sponsors



Hackathon Sponsors



Sponsorship opportunities are available!

For partnering and sponsorship information, please contact:

Companies A-K

Rod Eymael

Business Development Manager

Cambridge Healthtech Institute

Phone: (+1) 781-247-6286

Email: reymael@healthtech.com


Companies L-Z

Aimee Croke

Business Development Manager

Cambridge Healthtech Institute

Phone: (+1) 781-972-5458

Email: acroke@cambridgeinnovationinstitute.com

Register Early and Save

Data Platforms and Storage Infrastructure