CMI-PB CHALLENGE - CMI-PB Blog pages

CMI-PB Challenge: A community prediction competition

CMI-PB challenge is a machine learning competition where bioinformaticians and general data science practitioners are invited to develop models to predict immune responses (tasks) from multi-source immunological datasets (features). Ideally, these models will use sophisticated methods for integrating multi-source datasets and may also permit the interpretation of the underlying factors. Come solve these immunological challenges and compete with thousands of people worldwide!

The goal of the CMI-PB prediction challenge is to develop a research community around challenges and advance science at a faster pace than can be done by anyone individual or research group. The CMI-PB consortium has generated multi-source data from several individuals, which include Ab titers (~4 antibodies/features), cell frequency (~20 cell types/features), gene expression (~50,000 RNA transcripts/features), and plasma proteomics (~50 proteins/features). It is your job to integrate these different data sources to predict different immune responses (tasks). More specifically, you will take multi-source data from several individuals on day 0 (baseline) and predict specific immune responses on later days.

How to participate?

Eligibility:

The competition is open to all individuals!
You must login/register to enter a submission into a challenge. To ‘register’ please make an account at https://www.cmi-pb.org/. After logging in, you can download the data and make a submission using the steps outlined here.
An "Entry" is complete and will be evaluated when the data is submitted in the layout and tsv format specified on the website on the prediction task page.
All entries must be during the competition period, displayed on the prediction task page

How to make submission?

Once your model has been properly trained/tuned you can download the test data (as you did for the training data) for which we will only provide the baseline feature data and NOT the task data (aka the answers). You can then use your models to predict on each task and submit RANKED values. Due to the low number of samples, we expected that predicting the task level directly would be particularly difficult so we are asking contests to submit ranked values instead. It’s up to you to decide at what stage of your modeling you want to include this ranking, by far the most straightforward method is to train on the raw values, use your model to make some predictions and then rank those predicted values.

The ultimate goal is to model as many of the tasks as possible and submit your prediction by the due date. You may make as many submissions as you want prior to the due date and these submissions will be automatically graded each time. Your last submission before the due date will be considered your FINAL submission despite the performance of previous submissions so please keep this in mind.

Submission Restrictions:

Only one account will be allowed per participant. You cannot submit different submissions from multiple accounts.
Each user is allowed 1 submission but can re-submit their final version multiple times. Note that the latest submission they submit is considered their ‘final’ version.
External data is not allowed. Participants agree to make no attempt to use additional data or data sources not provided

Evaluation metrics and Code sharing

Evaluation Metrics:

The competition’s final results will be determined solely based on the accuracy of your rankings from the numerical values inputted for the ab titer levels. In other words, these rankings will be automatically given based on the numerical value you give for each titer level. A tie between two or more levels will not be permitted.

Data Use and Code Sharing:

Participants can use the data solely for the purpose and duration of the competition but are not limited to reading and learning from the data, analyzing and modifying the data to prepare you for your submission and model.

Warranties and Obligations:

Each Participant is solely responsible for all equipment, including but not necessarily limited to a computer and internet connection necessary to access the Website and to develop and upload any Submission. Sponsor is not responsible for (a) late, lost, stolen, damaged, garbled, incomplete, incorrect or misdirected Entries or other communications, (b) errors, omissions, interruptions, deletions, defects, or delays in operations or transmission of information, in each case whether arising by way of technical or other failures or malfunctions of computer hardware, software, communications devices, or transmission lines or (c) data corruption, theft, destruction, unauthorized access to or alteration of Entry materials, loss or otherwise.

Prize:

This is a challenge meant to create a community of models and gather the information that can help us identify ways to predict immune responses. These models will be analyzed to determine how different immune response variables are interlinked. Currently there is no prize for getting “first place.”

Share your work:

Show off your skills! You can add a link to your work for this competition, whether that is on your blog, on GitHub, BitBucket, GitLab or anywhere else. If you want to share this with our lab and other contestants, please post your Gitlab/any links here on GitHub.