Christian Ohmann (KKS Düsseldorf, ECRIN): Building a metadata repository (MDR) for clinical research
Authors: C. Ohmann, S. Goryanin, S. Canham (ECRIN), J. Kudzia, L. Dutka (ACK Cyfronet AGH) S. Nicotri, A. Italiano, G. Donvito (Istituto Nazionale di Fisica Nucleare – INFN)
It has been recognised that FAIR data play an essential role in the objectives of Open Science. In clinical studies, discoverability, an essential component of FAIR data (findability,) is a major issue despite the implementation of clinical trial registries. In order to have access to all documents belonging to a clinical trial (e.g. publications, study protocol, statistical analysis plan, individual participant dataset), a central web portal federating available data sources (including registries, repositories) is necessary, making that information searchable. Such a portal has been developed in the EU H2020-funded project eXtreme DataCloud (XDC) (http://www.extreme-datacloud.eu/) .
Software development is based upon a detailed use case description, formal requirements and standardised metadata schema and data structures and is part of the XDC infrastructure. Metadata from given data sources are imported and mapped to a standardised metadata schema and pumped into OneData. Functionality for discoverability of studies and related data objects is provided by INFN and the GUI (web portal) is developed by OneData.
The ECRIN metadata schema for clinical studies based upon DataCite was updated (https://zenodo.org/record/4028900#.X36oKdAzbcs ). So far metadata from 7 data sources have been imported (WHO-ICTRP – including CT.gov, PubMed, WWARN, Edinburgh DataShare, BioLINCC, ZENODO, Data Dryad), using different modalities (e.g. DB download, OAI-PMH, scraping of web pages). 551.003 studies from 18 registries and 820.793 data objects, covering 31 object types and 3 repositories have been integrated. The metadata acquired have been mapped to the ECRIN metadata schema using standard JSON templates and have been stored on servers at INFN, Bologna. The MDR portal has been implemented using functionality and metadata management of OneData, search and filter functionality (Elasticsearch) has been provided by INFN (crmdr.org). A standardised user evaluation revealed good usability and user satisfaction. The portal was officially launched in April 2020. The MDR was put in production for the ECRIN task force on COVID-19, has been linked to the European COVID-19 data portal and is included in the recommendations of the RDA COVID-19 guidelines.
The MDR was well received by the clinical research community. It was accepted as early adopter in the EOSC-hub project and as use case in the EOSC-Life project. Focus of further development are revision of the web portal, upgrade of the metadata injection process, development of APIs, modification of data extraction with periodic interrogation and extension of extraction to other data repositories.