Causal discovery for cohort data


This project capitalises on methods of causal discovery to supplement standard statistical analyses and hence fully exploit the potential wealth of information provided by cohort data. Cohort studies are a valuable resource for researchers, e.g. in epidemiology or sociology, when studying life-course developments so as to understand the relation between early exposures on later outcomes. Causal discovery is a field at the intersection of computer science and statistics. In ist idealised form, it takes a dataset as input and outputs a graphical representation of the causal structure among the variables in the dataset, albeit relying on very specific assumptions. While these methods currently attract much attention, especially in the context of big data, to date, neither theory nor software are specifically targeted at, nor suitable for, cohort data – this DFG-project will fill this gap. In particular we will
  • (1) formulate and investigate a new class of causal models, cohort causal graphs (CCGs), and develop suitable and efficient model selection algorithms;
  • (2) find new statistical approaches to address the particular challenges to causal discovery posed by typical cohort data, especially that of missing values;
  • (3) develop guidelines, including recommendations and caveats, as well as user-friendly software for practical applications, so as to enable wide dissemination of the new methodology.

This constitutes a promising enterprise because causal discovery takes a radically different approach from traditional statistical analyses; it therefore has the potential to generate genuinely novel insights, including valuable suggestions for follow-up intervention studies which ultimately contribute to informing public health policies and medical decision making.

Funding period

Begin:   January 2018
End:   May 2021


  • German Research Foundation


Prof. Dr. rer. nat. Vanessa Didelez


  • Prof Marloes Maathuis (Department of Mathematics, ETH Zürich, Schweiz)
  • Prof Peter Spirtes (Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, USA)
  • Prof Stijn Vansteelandt (Department of Appl. Mathematics, Computer Science and Statistics, University of Ghent, Belgium)

Selected project-related publications

    Articles with peer-review

  • Witte J, Didelez V. Covariate selection strategies for causal inference: Classification and comparison. Biometrical Journal. 2019;61(5):1270-1289.
  • Posters at scientific meetings/conferences

  • Witte J, Didelez V. Exploring the causal structure in cohort data using generalized IDA. Rostock Retreat on Causality, 2-4 July 2018, Rostock.
  • Witte J, Didelez V. Exploring the causal structure in cohort data using generalized IDA. Bocconi Summer School in Advanced Statistics and Probability, 9-20 July 2018, Como, Italy.