Development and evaluation of three-stage procedures for modeling exposure patterns in epidemiological studies


The interaction of various lifestyle-related factors contributes substantially to the development of multifactorial diseases such as diabetes, obesity and cancer. This project aims to develop a novel three-stage procedure (3-SP), and to investigate its properties and practical application, so as to improve the statistical analysis of the role of such factors in the development of diseases within the context of epidemiological studies. The 3-SP is specifically designed to address the growing complexity of available data, as methods traditionally employed in epidemiological studies fall short of this challenge. The growing complexity of the data is due to technical advances of recent years which enabled an increased validity of survey instruments. For instance, web-based 24-hour dietary recalls and wearable accelerometers are now routinely used to assess dietary intake and physical activity over a time period by frequent repeated measurements. To model the relation between exposure and outcome appropriately based on such high-dimensional data with intra-individual variation the 3-SP proceeds in three steps. At the first stage, the usual exposure is derived using error correction models accounting for intra-individual variation. This provides the basis for the second stage, where cluster models are used to assign individuals to exposure patterns which in turn are used at the third stage to model associations between such patterns and the outcome of interest using regression models. This multi-stage combination of different statistical methods gives rise to an error propagation which needs to be taken into account when investigating the properties of the final estimators. Furthermore, it was addressed which method is most appropriate and feasible at each stage. Hence, the project investigated the theoretical bias and the variance of estimators derived by combining different methods in the 3-SP; moreover, recommendations for the application of the 3-SP were developed and its practical feasibility examined. To this end, error correction methods such as SIMEX wwere considered, theoretical investigations and a simulation study was conducted. The feasibility of the 3-SP methodology for practical applications was investigated using data from the IDEFICS/I.Family studies. These studies provide cohort data on 16,229 European children aged 2 to 9.9 years. Due to the detailed phenotyping of the participating children the data offer a typical use case for assessing the proposed methodology.

Funding period

Begin:   February 2018
End:   January 2021


  • German Research Foundation


Prof. Dr. rer. nat. Iris Pigeot