Report on linkages between study-specific data hubs for omics-data types hosted by EMBL and study clinical-epidemiological data and metadata

Summary
Task 83 Linking clinicalepidemiological data with the EMBL data hubs for OMICS high dimensional data for COVID19 cohortsLead EMBL other beneficiaries and partners involved UKHD UCD RIMUHC MaelstromTask 83 focuses on integrating COVID19 clinicalepidemiological data for ECfunded studies with viral and host highdensity omics on the Data Hubs These will leverage the existing COMPARE and SARSCoV2 Data Hub infrastructure for pathogens host Data Hubs from WP4 task 41 and connected Data Hubs from WP4 task 42 With adaptation for both sensitive human research subject sources tissues and primary cell lines and openly sharable sources most transformed cell lines and nonhuman hosts we expect a number of connected Data Hubs across multiple studiesOur work aims to provide clear linkages between clinicalepidemiological and omics data types Crossstudy or studyOpen Science Community sharing of omics and clinicalepidemiological data are subject to very different PEARL barriers Permissions and data storage for these data types may differ both within and across studies depending on study team preferences and subject to national laws and national or local ethics review committee guidance We will use existing infrastructure and leverage longterm investments in EMBL and Maelstrom to facilitate these linkagesPrior work by the COMPARE Consortium demonstrates that the decentralised data hub structure proposed here facilitates data sharing by allowing different levels of sharing that depend on country study type data type data recipient etc which builds data generators confidence in the platform The cloudbased federated platform leverages the significant compute resources needed for standalone analyses of omics data types and other types of analyses eg group Lasso which leverage both clinicalepidemiological and omics data types more accurate and precise individuallevel predictions of the effectiveness of different treatments or the risk factors associated with severe disease outcomes or death As stated earlier analyses and key findings will be considered an essential element of the portals metadata Data generators will work together with the EC to develop an agreed upon approach to applications for data use for restricted data types and the process for applying to access the data and the criteria and timeline for review will be clearly documented to facilitate the use of harmonised data