MENÜ

GMDS Biostatistics Competition 2026

"Reliable Subgroup Identification and Analysis"

Next milestone

Announcement Event on 2026/02/18, 3:00 PM (CET)

In this online event the competition will be explained and you have the possibility to ask your questions.

You need to register in order to participate.

Registration for Announcement Event

About the competition

The Biostatistics section of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS) is hosting the GMDS Biostatistics Competition, a new format aiming to foster collaboration between biostatisticians and data scientists from academia and industry. In this year, the competition is about "Reliable Subgroup Identification and Analysis", a very relevant topic proposed by colleagues at Merck Healthcare KGaA. Besides strengthening the collaboration of current and emergent experts in the field, the competition aims to facilitate scientific advances in a constructive and competitive environment.  

Problem description

A central methodological challenge is identifying which patients will benefit most from a given regimen. Whereas current methods often rely on single biomarkers from simple subgroup analyses, machine learning (ML) approaches may be able to integrate multiple biomarkers and learn more nuanced decision rules to better match patients to treatments. Several novel methods for subgroup analysis in clinical trials have been developed in recent years to leverage ML [1]. However, these methods often rely on black-box algorithms for patient selection, which provide limited practical guidance and lack explainability. Highlighting the continued interest in the field, a workflow for assessing treatment-effect heterogeneity was recently proposed by Sechidis et al. [2]. 

For the Biostatistics Competition 30 simulated datasets akin to clinical trial data should be analysed by participants. Each dataset represents a randomised phase II or phase III trial with a binary or normally distributed endpoint. True patient assignment to a subgroup or its complement is defined using a subset of the available covariates. The data‑generating mechanism and the true subgroup assignment will not be disclosed to participants to ensure a blinded analysis that reflects real‑world practice. Whilst similar challenges have been published [3,4, 5], our aim is to tap into the diverse backgrounds and expertise of participants that take part in this competition. 

One or more of the following concrete tasks may be addressed by the participants: 

  • Task 1: "Assessing treatment effect heterogeneity"  
    Indicating whether a subgroup and a complement exist, whose treatment effects exhibit heterogeneity beyond what can be attributed to randomness. 
  • Task 2: "Performing variable selection" 
    Identifying prognostic and predictive baseline variables.
  • Task 3: "Assigning individuals" 
    Assigning patients to the subgroup or complement, with a clear definition of the decision criteria used. 
  • Task 4: "Estimating subgroup proportion" 
    Defining the proportion of the total population that may be assigned to the subgroup with enhanced benefit. 
  • Task 5: "Estimating treatment effect" 
    Estimating the treatment effect in the identified subgroup and complement.

References 

[1] Lipkovich I, et al. Statistics in Medicine. 2024; 43(22): 4388-4436. doi: 10.1002/sim.10167 

[2] Sechidis K, et al. Pharmaceutical Statistics. 2025; 24: e2463. doi: 10.1002/pst.2463 

[3] Bornkamp B, et al. Pharmaceutical Statistics. 2024; 23(4): 495-510. doi: 10.1002/pst.2368 

[4] Ruberg SJ. Pharmaceutical Statistics. 2021; 20(5): 939-944. doi: 10.1002/pst.2110 

[5] Ruberg SJ, et al. Biometrical Journal. 2023; 66(1): 2200164. doi: 10.1002/bimj.202200164

Data

Registered participants will receive four files: 

  1. A sheet with additional guidance and explanations beyond information provided on this web page. 

  2. A sheet to indicate which of the tasks their approach(es) is (are) designed to address. 

  3. A zip-file with 30 data sets in csv format:

    Figure: Example of a provided dataset where Y is the outcome, x1, ..., xn are baseline covariates, and W is the treatment group.

  4. A results file that allows to collect dataset-level information, including subgroup existence, predictive and prognostic biomarkers, subgroup size, treatment effect estimates, and other relevant metrics: 

    Figure: Example of a returned results file where each row represents a dataset ID, along with corresponding results depending on the selected task. 

Submission Format

Participants will be asked to submit the following documents / files: 

  1. Sheet indicating which of the following tasks their approach(es) is (are) designed to address. 
  2. If subgroup/complement assigning was addressed, please return a ZIP file containing the 30 datasets with patient-level subgroup assignments added as an extra column, using a variable 'S' where 1 indicates subgroup and 0 indicates complement. 
  3. Results file that allows for the collection of dataset-level information if one of the other tasks is selected. 
  4. Provide a short summary of the methods/algorithms applied for each task that was addressed. If multiple approaches were used for a task, indicate how the final decision was derived. Additionally, if the approach includes hyperparameters, please include those as well, specifying whether they were fixed or determined via techniques such as cross-validation. For each task, please limit your submission to no more than 2 pages in Word.

Those receiving positive feedback on their submission, i.e. being pre-selected for presentation at GMDS, are asked to submit the corresponding code (which will be made publicly available after the conference) or share a GitHub link, along with their poster/presentation before the conference. 

Please note that, akin to real-world drug development practices, even if multiple algorithms used yield inconsistent results, participants are expected to reach a final decision for each task on each dataset. 

Evaluation Criteria

The primary evaluation outcome will be an overall rating that takes all selected tasks into account. Additionally, there will be task-specific ratings. These ratings will be derived by comparing participants' outputs with those from the true data-generating model.   

  • Task 1: "Assessing treatment effect heterogeneity"  
    Agreement rate over all data sets 
  • Task 2: "Performing variable selection" 
    Average score with score per data set based on proportion of correctly classified variables into neither/predictive/prognostic/both plus a bonus of 0.25 if the exact correct set of predictive variables is selected. Note, declaring predictive as both or vice versa is considered as correct decision for the bonus. 
  • Task 3: "Assigning individuals" 
    Average over all agreement rates per data set 
  • Task 4: "Estimating subgroup proportion" 
    Average difference between estimated proportion and true rate per data set 
  • Task 5: "Estimating treatment effect" 
    Average root mean square error (RMSE)

To be included in the rating for each task, at least 80% of the provided data sets must have been analysed. 

Conclusion and Publication

It is anticipated that the results of the Biostatistics Competition will be presented and discussed at the ISCB/GMDS conference in Freiburg (ISCB GMDS 2026 Conference). The team(s) with the best solutions will be invited to present their approach. One representative of the winning team will receive a free conference ticket.  It will also be possible to remotely participate in the final workshop. 

In addition, the most interesting approaches will be invited to be published in a joint scientific publication.  

Participation Terms

  • If there are not at least three valid submissions, the organization committee may decide to cancel the concluding workshop.
  • If the minimum number of participants is not reached, publication is planned anyhow.

Timeline

Date

Time (CET)

Event

2026/02/18

3:00 PM

Announcement Event 
Registration

2026/02/19

Registration opens

2026/04/15

Registration ends

2026/06/30

Submission of results ends

2026/08/14

Feedback on submission of results

2026/09/15

Deadline for submission of final results

tba

tba

Final workshop at ISCB/GMDS Conference 

Joint publication

Organizers

  • Max Westphal (GMDS)
  • Anika Großhenning (GMDS)
  • Heiko Götte (Merck)
  • Aslihan Gerhold-Ay (Merck)
  • Clara Eléonore Pavillet (Merck)

Contact

If you have questions or suggestions, feel free to contact us at competition@gmds.de.