Statistical guidance and approaches for dealing with heterogeneity, missing data and measurement error in pooled cohort data sets

Summary
Task 33 Reconciling measurements of individual cohort participants across heterogeneous data setsKey problems when combining multiple data sources arise when cohort studies adopt different variable definitions or measurement methods when data are prone to measurement error or when studies are affected by missing data For this reason this work package will develop a statistical framework to simultaneously account for all of the aforementioned sources of uncertainty and bias This framework will integrate stateoftheart methods for dealing with missing data and measurement error and extend them for application in heterogeneous data sets Further new multivariate metaanalysis methods will be developed to reconcile situations where standardization of certain variables is no longer feasible These methods will adopt advanced penalization schemes to facilitate their applicability in sparse and high dimensional data sets Finally we will integrate input from scientific experts ie immunologists virologists statisticians and teams on the ground to ensure that the underlying data generation processes are properly accounted for The proposed framework will adopt a Bayesian estimation paradigm to simultaneously propagate all relevant sources of uncertainty and to adapt model complexity as new participantlevel data covariates andor studies become available Data curation and statistical methods will work in concert to ensure that both the model complexity and findings are based on the most recent dataevidence