Emmy Noether Junior Research Group: Beyond Prediction - Statistical Inference with Machine Learning

The junior research group focusses on the development of statistical inference methods for machine learning methods. We put special emphasis on problems faced in epidemiology such as confounding, high-dimensional data and survival outcomes. The project is of methodological nature but with a strong focus on applications. Our methods are publicly available as software packages, ready to be used by practitioners and applied researchers.

Major research interests oft he research group are:

  • Interpretable machine learning
  • Statistical properties of machine learning methods
  • Survival analysis
  • Statistical software
  • Application to high dimensional data


The group is funded by the Emmy Noether programme of the German Research Foundation (DFG) and headed by Marvin N. Wright.

Selected Publications

    Conference proceedings

  • Blesch K, Wright MN, Watson DS. Unfooling SHAP and SAGE: Knockoff imputation for Shapley values. In: Longo L, editors. Explainable artificial intelligence. xAI 2023. Volume 1901. Cham: Springer. 2023. S. 131-146.
    https://doi.org/10.1007/978-3-031-44064-9_8
  • Baudeu R, Wright MN, Loecher M. Are SHAP values biased towards high-entropy features? . Machine learning and principles and practice of knowledge discovery in databases. ECML PKDD 2022. Volume 1752. Cham: Springer. 2023. S. 418-433.
    https://doi.org/10.1007/978-3-031-23618-1_28
  • Molnar C, Freiesleben T, König G, Herbinger J, Reisinger T, Casalicchio G, Wright MN, Bischl B. Relating the partial dependence plot and permutation feature importance to the data generating process. In: Longo L, editors. Explainable artificial intelligence. xAI 2023. Volume 1901. Cham: Springer. 2023. S. 456-479.
    https://doi.org/10.1007/978-3-031-44064-9_24
  • Hiabu M, Meyer JT, Wright MN. Unifying local and global model explanations by functional decomposition of low dimensional structures. In: Ruiz F, Dy J, van de Meent J-W, editors. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. 2023.
    https://proceedings.mlr.press/v206/hiabu23a/hiabu23a.pdf
  • Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. In: Ruiz F, Dy J, van de Meent J-W, editors. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. 2023.
    https://proceedings.mlr.press/v206/watson23a/watson23a.pdf
  • Koenen N, Wright MN, Maass P, Behrmann J. Generalization of the change of variables formula with applications to residual flows. Thirty-eighth international conference on machine learning (ICML) workshop on invertible neural networks, normalizing flows, and explicit likelihood models. 2021.
    https://openreview.net/forum?id=msCiI5dejr
  • Articles with peer review

  • Blesch K, Watson DS, Wright MN. Conditional feature importance for mixed data. AStA Advances in Statistical Analysis. 2024;108(2):259-278.
    https://doi.org/10.1007/s10182-023-00477-9
  • Mehlig K, Foraita R, Nagrani R, Wright MN, De Henauw S, Molnár D, Moreno LA, Russo P, Tornaritis M, Veidebaum T, Lissner L, Kaprio J, Pigeot I, on behalf of the IDEFICS and I.Family consortia. Genetic associations vary across the spectrum of fasting serum insulin: Results from the European IDEFICS/I.Family children's cohort. Diabetologia. 2023;66(10):1914-1924.
    https://doi.org/10.1007/s00125-023-05957-w
  • Spytek M, Krzyzinski M, Langbein S, Baniecki H, Wright MN, Biecek P. survex: An R package for explaining machine learning survival models. Bioinformatics. 2023;39(12):btad723.
    https://doi.org/10.1093/bioinformatics/btad723
  • Bonannella C, Hengl T, Heisig J, Parente L, Wright MN, Herold M, de Bruin S. Forest tree species distribution for Europe 2000-2020: Mapping potential and realized distributions using spatiotemporal machine learning. PeerJ. 2022;10:e13728.
    https://doi.org/10.7717/peerj.13728
  • Blesch K, Hauser OP, Jachimowicz JM. Measuring inequality beyond the Gini coefficient may clarify conflicting findings. Nature Human Behaviour. 2022;6:1525-1536.
    https://doi.org/10.1038/s41562-022-01430-7
    http://hdl.handle.net/10871/130494
  • Wright MN, Kusumastuti S, Mortensen LH, Westendorp R, Gerds T. Personalised need of care in an ageing society: The making of a prediction tool based on register data. Journal of the Royal Statistical Society. Series A (Statistics in Society). 2021;184(4):1199-1219.
    https://doi.org/10.1111/rssa.12644
  • Askland KD, Strong D, Wright MN, Moore JH. The translational machine: A novel machine-learning approach to illuminate complex genetic architectures. Genetic Epidemiology. 2021;45(5):485-536.
    https://dx.doi.org/10.1002/gepi.22383
  • Watson DS, Wright MN. Testing conditional independence in supervised learning algorithms. Machine Learning. 2021;110(8):2107-2129.
    https://doi.org/10.1007/s10994-021-06030-6
  • Breau B, Brandes B, Wright MN, Buck C, Vallis LA, Brandes M. Association of individual motor abilities and accelerometer-derived physical activity measures in preschool-aged children. Journal for the Measurement of Physical Behaviour. 2021;4(3):227-235.
    https://doi.org/10.1123/jmpb.2020-0065
    https://repository.publisso.de/resource/frl:6428750
  • Hüls A, Wright MN, Bogl L-H, Kaprio J, Lissner L, Molnár D, Moreno LA, De Henauw S, Siani A, Veidebaum T, Ahrens W, Pigeot I, Foraita R. Polygenic risk for obesity and its interaction with lifestyle and sociodemographic factors in European children and adolescents. International Journal of Obesity. 2021;45(6):1321-1330.
    https://dx.doi.org/10.1038/s41366-021-00795-5
  • Brandes B, Buck C, Wright MN, Pischke CR, Brandes M. Impact of "JolinchenKids - fit and healthy in daycare" on children's objectively measured physical activity: A cluster-controlled study. Journal of Physical Activity & Health. 2020;17(10):1025-1033.
    https://doi.org/10.1123/jpah.2019-0536
    https://repository.publisso.de/resource/frl%3A6422936
  • Schmid M, Welchowski T, Wright MN, Berger M. Discrete-time survival forests with Hellinger distance decision trees. Data Mining and Knowledge Discovery. 2020;34(3):812-832.
    https://doi.org/10.1007/s10618-020-00682-z
  • Boulesteix A-L, Wright MN, Hoffmann S, König IR. Statistical learning approaches in the genetic epidemiology of complex diseases. Human Genetics. 2020;139(1):73-84.
    https://doi.org/10.1007/s00439-019-01996-9
  • Weinhold L, Schmid M, Mitchell R, Maloney KO, Wright MN, Berger M. A random forest approach for bounded outcome variables. Journal of Computational and Graphical Statistics. 2020;29(3):639-658.
    https://doi.org/10.1080/10618600.2019.1705310
  • Wright MN, König IR. Splitting on categorical predictors in random forests. PeerJ. 2019;7:e6339.
    https://doi.org/10.7717/peerj.6339
  • Steenbock B, Wright MN, Wirsik N, Brandes M. Accelerometry-based prediction of energy expenditure in preschoolers. Journal for the Measurement of Physical Behaviour. 2019;2(2):94-102.
    https://doi.org/10.1123/jmpb.2018-0032
    https://repository.publisso.de/resource/frl%3A6426272
  • Probst P, Wright MN, Boulesteix A-L. Hyperparameters and tuning strategies for random forest. Data Mining and Knowledge Discovery. 2019;9(3):e1301.
    https://doi.org/10.1002/widm.1301
  • Hornung R, Wright MN. Block forests: Random forests for blocks of clinical and omics covariate data. BMC Bioinformatics. 2019;20:358.
    https://doi.org/10.1186/s12859-019-2942-y
  • Foraita R, Dijkstra L, Falkenberg F, Garling M, Linder R, Pflock R, Rizkallah Issak MR, Schwaninger M, Wright MN, Pigeot I. Aufdeckung von Arzneimittelrisiken nach der Zulassung: Methodenentwicklung zur Nutzung von Routinedaten der gesetzlichen Krankenversicherungen. Bundesgesundheitsblatt, Gesundheitsforschung, Gesundheitsschutz. 2018;61(9):1075-1081.
    https://doi.org/10.1007/s00103-018-2786-z
    https://repository.publisso.de/resource/frl%3A6421679
  • Fouodo CJK, König I, Weihs C, Ziegler A, Wright MN. Support vector machines for survival analysis with R. The R Journal. 2018;10(1):412-423.
    https://doi.org/10.32614/RJ-2018-005
  • Hengl T, Nussbaum M, Wright MN, Heuvelink GB, Gräler B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ. 2018;6:e5518.
    https://doi.org/10.7717/peerj.5518
  • Hirose M, Schilf P, Gupta Y, Zarse K, Künstner A, Fähnrich A, Busch H, Yin J, Wright MN, Ziegler A, Vallier M, Belheouane M, Baines JF, Tautz D, Johann K, Oelkrug R, Mittag J, Lehnert H, Othman A, Jöhren O, Schwaninger M, Prehn C, Adamski J, Shima K, Rupp J, Haesler R, Fuellen G, Köhling R, Ristow M, Ibrahim SM. Low-level mitochondrial heteroplasmy modulates DNA replication, glucose metabolism and lifespan in mice. Scientific Reports. 2018;8:5872.
    https://doi.org/10.1038/s41598-018-24290-6
  • Nembrini S, König I, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34(21):3711-3718.
    https://doi.org/10.1093/bioinformatics/bty373
  • Editorials

  • Boulesteix A-L, Wright MN. Special issue: Artificial intelligence in genomics. Human Genetics. 2022;141(9):1449-1450.
    https://doi.org/10.1007/s00439-022-02472-7
  • Book chapters

  • Wright MN. Feature selection. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 146-160.
    https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html
  • Casalicchio G, Burk L. Evaluation and benchmarking. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 53-82.
    https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html
  • Binder M, Pfisterer F, Becker M, Wright MN. Non-sequential pipelines and tuning. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 174-195.
    https://mlr3book.mlr-org.com/chapters/chapter8/non-sequential_pipelines_and_tuning.html
  • Dandl S, Biecek P, Casalicchio G, Wright MN. Model interpretation. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 259-282.
    https://mlr3book.mlr-org.com/chapters/chapter12/model_interpretation.html
  • Pigeot I, Fröhlich H, Intemann T, Prause G, Wright MN. KI und die Nationale Forschungsdateninfrastruktur für personenbezogene Gesundheitsdaten (NFDI4Health). In: Dössel O, Schäffter T, Rutert B, editors. Künstliche Intelligenz in der Medizin. Berlin: Berlin-Brandenburgische Akademie der Wissenschaften. 2023. S. 62-74.
    https://nbn-resolving.org/urn:nbn:de:kobv:b4-opus4-37962
  • Presentations at scientific meetings/conferences

  • Golchian P, Kapar J, Blesch K, Watson DS, Wright MN. Adversial random forests for imputing missing values. 70th Biometric Colloquium, 28 February-1 March 2024, Lübeck.
  • Burk L, Zobolas J, Bischl B, Bender A, Lang M, Wright MN, Sonabend R. A large-scale neutral comparison study of survival models. 70th Biometric Colloquium, 28 February-1 March 2024, Lübeck.
  • Langbein S, Wright MN. Interpretable machine learning for survival analysis. Survival Analysis for Junior Researchers (SAfJR), 13-15 September 2023, Günzburg.
  • Kapar J, Günther K, Watson DS, Wright MN. Generative modeling of epidemiological data using adversarial random forests. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
  • Burk L, Bender A, Wright MN. High-dimensional variable selection for competing risks with cooperative penalized regression. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
  • Koenen N, Wright MN. Interpreting neural networks: A biostatistical perspective. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
  • Koenen N, Wright MN. Interpreting deep neural networks with the R package innsight. 6. Konferenz der Deutschen Arbeitsgemeinschaft Statistik (DAGStat), 28. März-1. April 2022, Hamburg.
  • Koenen N, Wright MN. Interpreting deep neural networks with the R package innsight. The R User Conference "UseR!," 20-23 June 2022, online presentation.
  • Wright MN, Blesch K, Watson DS. Testing conditional independence in supervised learning algorithms with the cpi package. The R User Conference "UseR!," 20-23 June 2022, online presentation.
  • Blesch K, Watson DS, Wright MN. Conditional variable importance for mixed data. 6. Konferenz der Deutschen Arbeitsgemeinschaft Statistik (DAGStat), 28. März-1. April 2022, Hamburg.
  • Wright MN. Genome-wide conditional independence testing with machine learning. 67. Biometrisches Kolloquium der Deutschen Region der Internationalen Biometrischen Gesellschaft (IBS-DR), 14.-17. März 2021, Online-Vortrag.
  • Hornung R, Wright MN. Block forests: Random forests for blocks of clinical and omics covariate data. 5th Conference of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat), 18-22 March 2019, Munich.
  • Wright MN, Mortensen LH, Kusumastuti S, Westendorp R, Gerds T. Recurrent neural networks for time to event predictions with competing risks. 5th Conference of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat), 18-22 March 2019, Munich.
  • Presentations at scientific meetings/conferences (invited)

  • Kapar J, Wright MN, Vallis LA. Generative modelling of Guelph Family Health Study data. Guelph Family Health Study Meeting, 15 November 2023, Guelph, Canada.
  • Langbein S, Wright MN. Interpretable machine learning for survival analysis. 11th Autumn Workshop of the DGEpi (German Society for Epidemiology), GMDS (German Association for Medical Informatics, Biometry and Epidemiology), IBS-DR (German Region of the International Biometric Society) and DGSMP (German Society for Social Medicine and Prevention), 9-10 November 2023, Mainz.
  • Blesch K, Watson DS, Wright MN. Conditional feature importance for mixed data. Seminar in Econometrics, 2 May 2023, Cologne.
  • Kapar J, Watson DS, Wright MN. Generative modeling of mixed tabular data using adversarial random forests. Machine Learning Research Group Talks and Brainstorms, 12 October 2023, Guelph, Canada.
  • Langbein S. Interpretable machine learning for survival analysis. DFG Sino-German collaboration workshop on statistical methods on lifestyle intervention studies and sharing data, 20 October 2023, Beijing, China.
  • Wright MN. Interpretable machine learning. Begegnungszone: Statistical Physics and Machine Learning, 18.-21. September 2023, Leipzig.
  • Blesch K, Wright MN, Watson DS. Unfooling SHAP and SAGE: Knockoff imputation for Shapley values. 2nd TRR 318 Conference "Measuring Understanding," 6-7 November 2023, Paderborn.
  • Wright MN. From explainable AI to generative modeling with tree-based machine learning. Statistics and Econometrics Seminar, Humboldt-Universität zu Berlin, 17 October 2023, Berlin.
  • Wright MN. Interpretable machine learning. Interpretable Machine Learning Workshop with the School of Statistics and Actuarial Science, University of the Witwatersrand, 19-20 September 2022, Johannesburg, South Africa.
  • Wright MN. Random forests: Myths and facts. 52nd Workshop Statistical Computing, 24-27 July 2022, Günzburg.
  • Wright MN. Random forests on high-dimensional data: From classification and survival analysis to generative modelling. Seminar des Graduiertenkollegs 2624 der Technischen Universität Dortmund, 4. Juli 2022, Dortmund.
  • Wright MN. Model-agnostic interpretable machine learning. MOOD (MOnitoring Outbreaks for Disease surveillance in a data science context) Webinar, 30 June 2021, online presentation.
  • Wright MN. Machine learning for time to event data. Expertenvortrag im Workshop des Projekts "ARTEMIS - Künstliche Intelligenz bei muskuloskelettalen Erkrankungen", 24. September 2021, Online-Vortrag.
  • Wright MN. Random forests: Myths and facts. Kolloquium "Statistische Methoden in der empirischen Forschung," 23. November 2021, Online-Vortrag.
  • Wright MN. Genome-wide interpretable machine learning. Seminar Series of the Charles Bronfman Institute for Personalized Medicine at the Icahn School of Medicine at Mount Sinai, 8 December 2021, online presentation.
  • Wright MN. Machine learning for survival data. Seminar des Instituts für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), Universitätsmedizin der Johannes Gutenberg-Universität Mainz, 10. Juni 2021, Mainz.
  • Wright MN. Interpretable machine learning in genetics. XXXIInd Conference of the Austro-Swiss Region (ROeS) of the International Biometric Society, 9 September 2021, Salzburg, Austria.
  • Wright MN. Interpretable machine learning for time to event data. Seminar des Instituts für Biometrie und Klinische Epidemiologie der Charité - Universitätsmedizin Berlin, 20. Januar 2020, Berlin.
  • Wright MN. Machine learning for survival data. MeVis Online Academy, 21 February 2020, Bremen.
  • Wright MN, Kusumastuti S. Predicting the personalized need of care in an ageing society. Epidemiological Seminar Series, 28 November 2019, Copenhagen, Denmark.
  • Wright MN. Random forests: The first-choice method for every data analysis? Why R? 2019 Conference, 26-29 September 2019, Warsaw, Poland.
  • Wright MN. Benchmarking machine learning algorithms. "Fraunhofer Academy" des Fraunhofer-Instituts für Digitale Medizin MEVIS, 15. November 2019, Bremen.
  • Posters at scientific meetings/conferences

  • Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. 26th International Conference on Artificial Intelligence and Statistics (AISTATS), 25-27 April 2023, Valenica, Spain.
  • Koenen N, Wright MN, Maass P, Behrmann J. Generalization of the change of variables formula with applications to residual flows. 38th International Conference on Machine Learning (ICML) Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+), 23 July 2021, online poster.
  • Wright MN. Splitting on categorical predictors in random forests. 64. Biometrisches Kolloquium, 25.-28. März 2018, Frankfurt.
  • Software

  • Wright MN, Watson DS. arf: Adversarial random forests. (Version 0.1.3); 2023.
    https://cran.r-project.org/package=arf
  • Koenen N, Baudeu R. innsight: Get the insights of your neural network. (Version 0.2.0); 2023.
    https://github.com/bips-hb/innsight
  • Wright MN, Wager S, Probst P. ranger: A fast implementation of random forests. (Version 0.16.0); 2023.
    https://cran.r-project.org/package=ranger
  • Blesch K, Wright MN. arfpy. (Version 0.1.1); 2023.
    https://github.com/bips-hb/arfpy
  • Krzyzinski M, Spytek M, Baniecki H, Biecek P, Gosiewska A, Langbein S. survex: Explainable machine learning in survival analysis. (Version 2.0); 2023.
    https://github.com/ModelOriented/survex
  • Hornung R, Wright MN. blockForest: Block forests: Random forests for blocks of clinical and omics covariate data. (Version 0.2.5); 2022.
    https://cran.r-project.org/package=blockForest
  • Wright MN, Watson DS. cpi: Conditional predictive impact. (Version 0.1.4); 2022.
    https://cran.r-project.org/package=cpi
  • Fritsch S, Günther F, Wright MN, Suling M, Mueller SM. neuralnet: Training of neural networks. (Version 1.44.2); 2019.
    https://CRAN.R-project.org/package=neuralnet
  • Wright MN. survnet: Artificial neural networks for survival analysis. (Version 0.0.5); 2019.
    https://github.com/bips-hb/survnet

Current Projects

Only currently running projects or those where publications are still in preparation or those that ended less than a year ago will be shown. The entries are sorted alphabetically.

Staff

Burk, Lukas
Tel.: +49 (0)421 218-56955
Fax: +49 (0)421 218-56941
burk(at)leibniz-bips.de

Golchian, Pegah
Tel.: +49 (0)421 218-56790
golchian(at)leibniz-bips.de

Kapar, Jan
Tel.: +49 (0)421 218-56929
Fax: +49 (0)421 218-56941
kapar(at)leibniz-bips.de

Koenen, Niklas
Tel.: +49 (0)421 218-56933
Fax: +49 (0)421 218-5641
koenen(at)leibniz-bips.de

Langbein, Sophie
Tel.: +49 (0)421 218-56886
langbein(at)leibniz-bips.de

Wright, Marvin N., Prof. Dr.
Tel.: +49 (0)421 218-56945
Fax: +49 (0)421 218-56941
wright(at)leibniz-bips.de