Summary
"Task 3.1: Standards for harmonization or reconciliation of CE cohort data We will develop standardized guidelines for variable definitions, data dictionaries, and protocols, which is relevant for the prospective as well as retrospective harmonization of data. The rationale for this task is go beyond “prospective” harmonization between cohorts where researchers agree on standardized methods BEFORE the initiation of their cohort projects. We also plan to develop a roadmap how heterogeneous data sets from different existing cohort projects with longstanding collection of data from the past can be pooled for specific analysis questions. 2.1.1 Prospective harmonizationTo facilitate prospective cross-cohort synthesis, we will further develop standardized procedures for data capture and curation, including clinical and demographic variable standards and data dictionaries. We will extend this towards data entry and cleaning, sample collection and laboratory external quality assessment. The use of standard definitions for data capture and management needs to be documented with the respective meta-data. We will base our work on existing CDISC/CDASH standards and develop additional standards when necessary. The investigators are involved in the ongoing harmonized protocols for Zika research and the Individual Participant Data Meta-Analysis (IPD-MA). The experience with these efforts will be integrated into this task. 2.1.2 Retrospective reconciliationWe will develop a roadmap towards synthesizing data across existing cohorts with heterogeneous protocols, variable dictionaries, and associated meta-data. Even if reconciliation will not be possible in all cases, the roadmap will include the definition of rules for minimum data sets that can be pooled for specific questions. The assessment of the expected uncertainties when combining heterogeneous data and meta-data will include statistical reasoning, but the focus will obviously be on the reconciliation of content - analyzing clinical, epidemiological, and laboratory definitions used in the individual cohort studies this consortium has access to. One of the strategies will be to create secondary variables that transport the heterogeneity as their definitions are broader than the initial primary variables. We will leverage our experience with Bayesian hierarchical models and missing data models to build a framework for this transportability. Specifically, an appropriate and data-determined amount of ""borrowing of strength"" will be used in having narrowly-defined variables (i.e., with cohort-specific definitions) contribute to knowledge of broadly-defined variables.We will also reflect on the scientific questions most likely to emerge in outbreak situations (where the original focus of the consortium lies) and suggest specific reconciliation scenarios – for instance with respect to the need for seroprevalence and burden estimations. However, the benefit of the retrospective reconciliation is beyond the outbreak scenarios and the work will be extend on a selection of concrete scientific questions to be selected by the participants. For these questions, we will analyze the influence of the heterogeneities of the primary variables on the outcome of interest. "
More information & hyperlinks