Activities of the Project Group
Aims of the project group:
The application of computer science in the life sciences is playing an increasingly important role. A key challenge is to combine, compare, and integrate the most suitable software and underlying algorithms into specific data analysis workflows. The goal of the current research and the working group is to address the needs of non-computer experts and enable them to conduct independent analysis using workflows. To this end, current tools within management platforms such as Bioconductor, Galaxy, Knime, and Snakemake, as well as other approaches such as BioConda, Common Workflow Language, or Docker, can be combined.
The use of workflows prevents (i) the context-specific use of tools that are no longer maintained up to date or were never developed for a specific problem, and (ii) the preservation of tools that are continuously maintained and whose behavior and parameters may therefore change over time. A further challenge is the comparison, benchmarking, selection, and integration of the most suitable tools, which is time-consuming and requires expertise in terms of computational power. Depending on the number of samples, the scale of the time series, and the sequencing depth, computations can require large computational resources such as cluster, grid, and cloud computing solutions. Adaptive management of available computing resources through load balancers and queuing systems is often unavoidable when creating analysis workflows.
A current approach to deploying workflows, including all necessary tools and dependencies, is software channels and containers such as Bioconda, Docker, or rkt. These containers are emerging as a potential solution to many of the previous problems, as they allow the packaging of workflows in an isolated and self-contained system, simplifying the distribution and execution of tools in a way that is easily transferable to a wide range of computational techniques.
In summary, workflows, management frameworks, and cloud computing services bridge the gap between tool developers and end users, promoting easy-to-use and scalable data analysis. This in turn enables improved data reproducibility, process documentation and monitoring of data analyses.
Publications in the context of the Workflows Working Group
Lott SC, Wolfien M, Riege K, Bagnacani A, Wolkenhauer O, Hoffmann S, Hess WR
Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments
Journal of Biotechnology
doi.org/10.1016/j.jbiotec.2017.06.1203
GMDS annual report 2023
Project group
Development, Implementation and Documentation of Data Processing Workflows
Markus Wolfien, Dresden (Head)
Activities from 1st January 2023 to 31st December 2023
The creation of workflows is a central aspect of data analysis and data integration, as comparing and selecting suitable analysis tools for a specific problem requires highly complex approaches. Therefore, the project group (PG) is dedicated to reviewing, creating, and implementing workflows and their underlying frameworks. The PG consists of eleven active and passive members, most of whom are employed in academia within and outside the GMDS. Activities include organizing workshops, writing publications, and facilitating a topic-related exchange of information on data analysis processes. The group also holds an annual meeting at the GMDS Annual Conference.
In 2023, the project group's activities included several events as part of the GMDS Annual Conference in Heilbronn. Two workshops were held on the topics of "Best Practices in Machine Learning and OMOP" and "Integration of Gene Expression Data in Disease Maps." In addition, a BarCamp was held on the integration of single-cell data in disease maps and clinical applications. The goal of the BarCamp was to discuss topics with interdisciplinary experts and also to complete a pre-prepared questionnaire. The results, with a total of over 50 responses, were collected and are currently being translated into a scientific article.
Term of office of the heads and their deputies
October 2021 bis October 2024
ARCHIVE OF ACTIVITY REPORTS
-
GMDS annual report 2022
GMDS annual report 2022
Project group
Development, Implementation and Documentation of Data Processing WorkflowsMarkus Wolfien, Dresden (Head)
Activities from 1st January 2022 to 31st December 2022
The creation of workflows is a central aspect of data analysis and data integration, as comparing and selecting suitable analysis tools for a specific problem requires highly complex approaches. Therefore, the project group (PG) is dedicated to reviewing, creating, and implementing workflows and their underlying frameworks. The PG consists of thirteen active and passive members, most of whom are employed in academia within and outside the GMDS. Activities include organizing workshops, writing publications, and a topic-related exchange of information on data analysis processes. This meeting also took place online in 2022.
Despite the long-standing interest in personalized decision support based on patient data, data scarcity and availability remain a significant challenge. A recent paper from the project group discussed the importance of AI-driven synthetic data generation for improving machine learning techniques in medical fields such as systems medicine and medical informatics and integrated it into a workflow [1]. The paper proposes using synthetic data, particularly in the context of palliative care screening, to improve ML-supported decision-making by overcoming data limitations and providing insights into current perspectives and potential impacts.
Planned activities in 2023
A project meeting is planned at the GMDS Annual Conference 2023 in Heilbronn, as well as two workshops and a BarCamp.
References
1. Hahn, W.; Schütte, K.; Schultz, K.; Wolkenhauer, O.; Sedlmayr, M.; Schuler, U.; Eichler, M.; Bej, S.; Wolfien, M. Contribution of Synthetic Data Generation towards an Improved Patient Stratification in Palliative Care. J. Pers. Med. 2022, Vol. 12, Page 12782022, 12, 1278, doi:10.3390/JPM12081278.
Term of office of the heads and their deputies
October 2021 to September 2024 -
Activities in 2019
Activities in 2019
Activities from 1st January 2019 to 31st December 2019
Workflows are a central aspect of data analysis and data integration, as selecting suitable analysis tools for a specific problem requires highly complex approaches. Therefore, the project group (PG) is dedicated to reviewing, creating, and implementing workflows and their underlying frameworks. The PG consists of nine active and passive members, most of whom are employed in academia within and outside the GMDS. Activities include organizing workshops, authoring publications, and facilitating a topic-related exchange of information on data analysis processes. The group also hosts an annual meeting at the GMDS Annual Conference.
Workshops and Activities in 2019
In 2019, the PG hosted three workshops across Germany with a total of approximately 30 participants. The one- and multi-day workshops provided general insights into data analysis and integration using workflows within the Galaxy data analysis platform (https://usegalaxy.eu/). The key points of the workshops were as follows:
- 3-day workshop on "Galaxy for linking bisulfite sequencing with RNA sequencing" in Rostock (March)
- Tutorial at the GMDS Annual Meeting 2019 on "NGS data analysis with Galaxy for clinical applications" in Dortmund (September)
- 3-day workshop on "Galaxy for linking bisulfite sequencing with RNA sequencing" in Freiburg (October)
The PG was also involved in the publication of a book chapter in Springer (Methods in Molecular Biology book series - Computational Biology of Non-Coding RNA) titled "Workflow Development for the Functional Characterization of ncRNAs" (Wolfien et al. 2019 doi.org/10.1007/978-1-4939-8982-9_5). This chapter generally discussed experimental protocols for the identification of non-coding RNAs, as well as presented and explained bioinformatics tools and software that can identify and characterize these transcripts.
Planned activities in 2020
A workshop and project meeting is already planned at the GMDS annual conference. -
Activities in 2021
Activities in 2021
Activities from 1st January 2021 to 31st December 2021
The creation of workflows is a central aspect of data analysis and data integration, as comparing and selecting suitable analysis tools for a specific problem requires highly complex approaches. Therefore, the project group (PG) is dedicated to reviewing, creating, and implementing workflows and their underlying frameworks. The PG consists of thirteen active and passive members, most of whom are employed in academia within and outside the GMDS. Activities include organizing workshops, writing publications, and a topic-related exchange of information on data analysis processes. This meeting also took place online in 2021.
In view of the growing interest in RNA single-cell and single-nucleus sequencing, workflows were further developed and evaluated to explore these formats in more detail. In this regard, a book chapter was published to provide an overview of current developments in single-cell analytics [1]. An introduction and practical guidance for selecting the most suitable sequencing method for individual experimental requirements in the course of investigating biological hypotheses are presented. Basic data analysis approaches are highlighted, followed by a discussion of advanced, downstream approaches for enriching the information obtained from single-cell experiments (e.g., trajectory analysis, pseudotime analysis, and network inference). In addition, unsolved challenges are discussed to help the reader avoid the most common pitfalls. In this context, the authors actively participated in the workshop of the PG "Single Cell Data" and presented a developed tool for single-cell annotation [2].
Due to the ongoing COVID-19 restrictions in 2021, the workshop planned in collaboration with de.NBI (https://www.denbi.de/) was held online on April 12 and 16, 2021 (link). The course covered the topic "Bioinformatics carpentry utilizing Galaxy" and focused on bioinformatics approaches using Galaxy (https://usegalaxy.eu/). Each day began with an interactive lecture, followed by a hands-on session. Participants received an introduction to Galaxy, learned how to use tools for data handling and preprocessing sequence data, and gained an overview of various Galaxy instances. The use of machine learning algorithms and evaluations were also discussed.
Planned activities in 2022
A project meeting is planned at the GMDS annual conference in Kiel.References
1. Wolfien, M.; David, R.; Galow, A.-M. Single-Cell RNA Sequencing Procedures and Data Analysis. In Bioinformatics; Exon Publications, 2021; pp. 19–35.<o:p></o:p>
2. Bej, S.; Galow, A.M.; David, R.; Wolfien, M.; Wolkenhauer, O. Automated annotation of rare-cell types from single-cell RNA-sequencing data through synthetic oversampling. BMC Bioinformatics 2021, 22, 1–17, doi:10.1186/S12859-021-04469-X/FIGURES/7.<o:p></o:p>
Term of office of the heads and their deputies
October 2021 to September 2024