Outcomes
Progress
BY-COVID runs from October 2021 to October 2024. The ultimate outcome of the project is that SARS-CoV-2 and other infectious disease data will be easier to access, share and analyse. This will enable the world to respond more quickly to infectious disease outbreaks. During the project there will also be specific outputs such as publications and deliverables. Deliverables will include reports and best practice guidelines. These outputs will appear here as the project progresses.
Publications
Towards increased accuracy and reproducibility in SARS-CoV2 sequence analysis
We examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARSCoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative.
Identification of mutations in SARS-CoV-2 PCR primer regions
We propose an analysis pipeline to discover genomic variations overlapping the target regions of commonly used PCR primer sets. These are in a publicly available format based on a dataset of more than 1.2 million SARS-CoV-2 samples.
Updating Linked Data practices for FAIR Digital Object principles
We believe that by adopting Linked Data principles, we can accelerate FAIR Digital Object (FDO) and start building practical ways to assist scientists in efficiently answering topical questions based on knowledge graphs.
Creating lightweight FAIR Digital Objects with RO-Crate
RO-Crate is a lightweight method to package research outputs along with their metadata, based on Linked Data principles. We present how we have followed the FAIR Digital Object (FDO) recommendations and turned research outcomes into FDOs by publishing RO-Crates on the Web using HTTP.
Investigating M.chimaera contamination in heater-cooler units
We found highly similar genetic and phenotypic profiles of M. chimaera isolated from heater-cooler units (HCU) used during surgery to thermo-regulate patients' body temperature and from the same hospital tap water, suggesting the need for environmental surveillance and associated control measures.
The FAIR Cookbook - the essential resource for and by FAIR doers
We present the FAIR Cookbook, its creation and content, its value, use and adoptions, as well as the participatory process, collaborative plans for sustainability, and its adoption
Comprehensive Fragment Screening of the SARS-CoV-2 Proteome Explores Novel Chemical Space for Drug Development
The international Covid19-NMR consortium have identified binders targeting the RNA genome of SARS-CoV-2. We provide novel structural and chemical space for structure-based drug design against the SARS-CoV-2 proteome
A lightweight distributed provenance model
We define a lightweight provenance model enabling generation of distributed provenance chains in complex, multi-organizational environments.
COVID-19 vaccine effectiveness assessment - CDM specification
The Common Data Model specification of the BY-COVID project (WP5) on COVID-19 vaccine effectiveness in preventing SARS-CoV-2 infection.
Systemic barriers to pathogen-related data sharing
We report results of a study interviewing data professionals working with COVID-19-relevant data types including social media, mobility, viral genome, testing, infection, hospital admission and deaths.
Clusters of unusual mutational changes in Omicron lineage BA.1
We propose that mutations in three clusters interact to mitigate their individual fitness costs and adaptively alter the function of Spike.
10 Simple Rules for making a software tool workflow-ready
Workflows have become a core part of computational scientific analysis in recent years. This paper presents 10 simple rules for how a software tool can be prepared for workflow use.
Host genomes for SARS-CoV-2 variant leaked into Antarctic soil
We follow up a report of a contaminated metagenomic sample set from Antarctica containing traces of unique SARS-CoV-2 variants. We identify genetic material from mitochondria of Homo sapiens, green monkey and Chinese hamster, the latter two probably originating from cell lines used for studying viruses.
Packaging research artefacts with RO-Crate
The aim of this paper is to introduce RO-Crate (an open, community-driven, and lightweight approach to packaging research artefacts along with their metadata in a machine readable manner) and assess it as a strategy for making multiple types of research artefacts FAIR.
Host genomes for SARS-CoV-2 variant leaked into Antarctic soil
We follow up a report of a contaminated metagenomic sample set from Antarctica containing traces of unique SARS-CoV-2 variants. We identify genetic material from mitochondria of Homo sapiens, green monkey and Chinese hamster, the latter two probably originating from cell lines used for studying viruses.
The response of the scholarly communication system to the COVID-19 pandemic
This paper analyses how the scholarly communication system – involving the production, evaluation, and dissemination of research outputs – has responded to this crisis, focusing on the period until mid-2021.
FAIR, ethical, and coordinated data sharing for COVID-19 response
Data sharing is central to the rapid translation of research into advances in clinical medicine and public health practice. This paper is a review of COVID-19 data sharing platforms and registries.
Ready-to-use public infrastructure for global SARS-CoV-2 monitoring
This paper presents the COVID-19 effort by the Galaxy Project, which pools free worldwide public computational infrastructure, making the analysis of deep sequencing data accessible to anyone while also providing an analytical framework for global pathogen genomic surveillance based on raw sequencing-read data.
Deliverables and Milestones
Outcomes are also published on the BY-COVID Zenodo community.
Code | Due | Description | Responsibility |
---|---|---|---|
D8.1 | 11/21 | Project Handbook initial release and periodic updates | WP8 |
D3.1 | 03/22 | Metadata standards | WP3 |
D7.1 | 03/22 | Dissemination, exploitation and communication Plan | WP7 |
D8.2.1 | 02/22 | Project Data Management Plan initial release and periodic updates | WP8 |
D2.1 | 06/22 | Initial data and metadata harmonisation at domain level to enable fast responses to COVID-19 | WP2 |
D1.1 | 09/22 | Extended workflows | WP1 |
D3.2 | 09/22 | Implementation of cloud-based, high performance, scalable indexing system | WP3 |
D8.2.2 | 12/22 | Project Data Management Plan initial release and periodic updates | WP8 |
D8.1.2 | 03/23 | Project Handbook initial release and periodic updates | WP8 |
D2.2 | 06/23 | Data Access and Transfer across research domains and jurisdictions | WP2 |
D1.2 | 09/23 | Preparedness Data Hub | WP1 |
D3.3.1 | 09/23 | COVID-19 Data Portal | WP3 |
D7.3 | 09/23 | Report on public engagement activities | WP7 |
D5.3 | 11/23 | Hot Spot detection, samples data collection and mechanistic analyses | WP5 |
D1.3 | 03/24 | Tracking and open analytics tools | WP1 |
D2.3 | 03/24 | Enabling data discovery at source using beacon-like mechanisms | WP2 |
D4.3 | 03/24 | Provenance model | WP4 |
D7.2 | 03/24 | Public report showcasing industry value from infectious disease data | WP7 |
D4.2 | 04/24 | Common analysis environment | WP4 |
D5.1 | 05/24 | Enriched report viral variants and health outcomes | WP5 |
D4.1 | 06/24 | Infectious diseases toolkit | WP4 |
D3.3.2 | 07/24 | COVID-19 Data Portal | WP3 |
D6.1 | 07/24 | Stakeholder engagement report | WP6 |
D6.2 | 07/24 | The training efforts report | WP6 |
D8.2.3 | 07/24 | Project Data Management Plan initial release and periodic updates | WP8 |
D8.3 | 07/24 | Report on sustainability plans | WP8 |
D2.4 | 09/24 | Report on data sources discovery and integration for enabling data use and re-use in response to future outbreaks | WP2 |
D5.2 | 09/24 | Secondary use of vaccine trial data and biosamples | WP5 |
D8.1.3 | 09/24 | Project Handbook initial release and periodic updates | WP8 |
Code | Due | Description | Responsibility |
---|---|---|---|
M7.1 | 10/21 | Branding and communications guidelines | WP7 |
M7.2 | 11/21 | Launch of project website | WP7 |
M8.1 | 11/21 | Project mobilised. All governing boards and WPs established | WP8 |
M8.2 | 02/22 | DMP approved by the relevant project boards before submission | WP8 |
M1.1 | 03/22 | First support services in operation | WP1 |
M2.1 | 03/22 | Identified data sources have been registered in the BY-COVID reference catalogue | WP2 |
M5.1 | 02/22 | Compiled research questions and requirements Workshop 1 | WP5 |
M6.1 | 03/22 | Stakeholder engagement (initial scoping and draft monitoring approach) | WP6 |
M5.4 | 09/22 | FAIR open-source pipeline | WP5 |
M6.2 | 09/22 | Identified training needs and roadmap | WP6 |
M7.3 | 09/22 | Industry sector mapping report | WP7 |
M4.1 | 09/22 | Common analysis environment | WP4 |
M4.2 | 09/22 | Prototype Infectious diseases toolkit | WP4 |
M2.2 | 01/23 | Identified the preferred mechanisms for data access and use of Real-world data | WP2 |
M1.2 | 03/23 | First globally comprehensive data set | WP1 |
M3.1 | 03/23 | Initial set of resources metadata mapped, indexed, and discoverable in COVID-19 Data Portal | WP3 |
M5.2 | 03/23 | Compiled research questions and requirements Workshop 2 | WP5 |
M5.5 | 03/23 | Viral variant and health outcomes | WP5 |
M5.3 | 03/24 | Compiled research questions and requirements Workshop 3 | WP5 |
M2.3 | 07/24 | Report on upgrade of clinical trial data and metadata | WP2 |