The FAIRsharing BY-COVID Collection of data sources

22 June 2023


The COVID-19 Data Portal makes it possible for researchers to access and integrate a broad range of COVID-19 and SARS-CoV-2 data. BY-COVID has developed tools, workflows, documentation and training to support the incorporation of additional resources from many different research disciplines.

A key initial step involves registering data sources and uploading their key characteristics to FAIRsharing. A cross-disciplinary resource that maps and interlinks databases, standards and policies, FAIRsharing enables efficient onboarding of new data sources and a means to ensure these are more discoverable in the European Open Science Cloud (EOSC) ecosystem.

The BY-COVID FAIRsharing Collection is a catalogue and knowledge graph data sources and their characteristics, including access terms, protocols and standards used to represent the data and metadata (Figure 1). The Collection currently contains 20 data sources (Table 1), developed by BY-COVID members, from social science and humanities, health and clinical data, images, genomic and phenotypic data and chemical biology.

Making a range of infectious disease data sources widely discoverable, accessible and interoperable is important for research and innovation, which is increasingly multidisciplinary in nature. For example, pathogen research is accelerated by the availability of data from clinical trials, biobanks, behavioural and socioeconomic studies, particularly if the data is combined with host and pathogen omics information. Many of these data types, for example clinical records or bioactivity data, may contain high resolution images, the availability of which extends the potential research questions which can be explored.

Multidisciplinary data is also critical for public health decision-making, where policy questions are complex and evidence from biomolecular research, clinical studies and social sciences must be taken into account. One lesson from the COVID-19 pandemic was that data-driven decision-making needs high quality, real-time data from many research disciplines and geographic areas in an integrated format. The BY-COVID project is building on these learnings and creating solutions for COVID-19 that can be extended to other pathogens. Resources like the COVID-19 Data Portal and FAIRsharing are pivotal to meet these goals.

The launch of the FAIRsharing BY-COVID Collection, which will grow progressively, marks an important step in the maturation of the BY-COVID project. Datasets from these data sources will now be incorporated into the COVID-19 Data Portal, providing access to heterogeneous, yet interlinked and organised data, across domains. Over the course of the project, emerging national data portals will be registered in FAIRsharing and linked to the COVID-19 Data Platform, building a federated digital space for infectious disease data.

BY-COVID Collection of data sources
Figure 1: The BY-COVID FAIRsharing Collection is a catalogue and knowledge graph of data sources and their characteristics, including access terms, protocols and standards used to represent the data and metadata.

Find out more:
Table 1: The BY-COVID FAIRsharing resource collection (as of 15 June 2023). View current status.
DomainResource and record in FAIRsharingType of data
Clinical and healthHealth Data Research Innovation Gateway; health datasets
European Health Information Portal (HIP); health information
ECRIN Clinical Research Metadata Repository; clinical studies, trial registrations, results summaries, journal articles, protocols
Dutch National Observational COVID-19 data portal; data from Dutch health care providers
BBMRI-ERIC Directory; information about biobanks across Europe
Dutch National Observational COVID-19 data portal; data portal for the exploration and reuse of clinical data from Dutch university medical centres
COVID-19 Data Portal; datasets and tools including SARS-CoV-2 sequence data
Genotypic and phenotypicThe European Genome-phenome Archive (EGA); identifiable genetic, phenotypic, and clinical data
European Mouse Mutant Archive (EMMA); mice strains essential for basic biomedical research
Social sciences and humanitiesEUI COVID-19 social sciences and humanities (SSH) Data Portal; research in the social sciences and humanities
Consortium of European Social Science Data Archives (CESSDA) Data Catalogue; social science data
European Social Survey (ESS) Data Portal; cross-national survey data measuring the attitudes, beliefs and behaviour patterns of diverse populations in more than thirty nations
Survey of Health, Ageing and Retirement in Europe (SHARE) Research Data Center; survey data on the effects of health, social, economic and environmental policies over the life-course of European citizens and beyond
Open Data Infrastructure for Social Science and Economic Innovations (ODISSEI) Portal; from most data collections relevant to the social science community in the Netherlands
ImagesElectron Microscopy Public Image Archive (EMPIAR); images underpinning 3D cryo-EM maps and tomograms
Electron Microscopy Data Bank (EMDB); microscopy density maps of macromolecular complexes and subcellular structures
Image Data Resource (IDR); data from genetic, RNAi, chemical, localisation and geographic high content screens, super-resolution microscopy and digital pathology
BioImage Archive; images that are useful to life-science researchers
Chemical BiologyChEMBL;, bioactivity and genomic data to aid the translation of genomic information into effective new drugs
European Chemical Biology Database (ECBD); results from biological screening programs
COVID 19-NMR; and protein structural data for SARS-CoV-2 as well as other viruses