Emmy Noether Junior Research Group: Beyond Prediction - Statistical Inference with Machine Learning - Leibniz Institute for Prevention Research and Epidemiology

Emmy Noether Junior Research Group: Beyond Prediction - Statistical Inference with Machine Learning

The junior research group focusses on the development of statistical inference methods for machine learning methods. We put special emphasis on problems faced in epidemiology such as confounding, high-dimensional data and survival outcomes. The project is of methodological nature but with a strong focus on applications. Our methods are publicly available as software packages, ready to be used by practitioners and applied researchers.

Major research interests oft he research group are:

Interpretable machine learning
Statistical properties of machine learning methods
Survival analysis
Statistical software
Application to high dimensional data

The group is funded by the Emmy Noether programme of the German Research Foundation (DFG) and headed by Marvin N. Wright.

Selected Publications

Conference proceedings

Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. In: Ruiz F, Dy J, van de Meent J-W, editors. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. Volume 206. 2023.
https://proceedings.mlr.press/v206/watson23a/watson23a.pdf
Blesch K, Wright MN, Watson DS. Unfooling SHAP and SAGE: Knockoff imputation for Shapley values. In: Longo L, editors. Explainable artificial intelligence. xAI 2023. Volume 1901. Cham: Springer. 2023. S. 131-146.
https://doi.org/10.1007/978-3-031-44064-9_8
Baudeu R, Wright MN, Loecher M. Are SHAP values biased towards high-entropy features? . Machine learning and principles and practice of knowledge discovery in databases. ECML PKDD 2022. Volume 1752. Cham: Springer. 2023. S. 418-433.
https://doi.org/10.1007/978-3-031-23618-1_28
Molnar C, Freiesleben T, König G, Herbinger J, Reisinger T, Casalicchio G, Wright MN, Bischl B. Relating the partial dependence plot and permutation feature importance to the data generating process. In: Longo L, editors. Explainable artificial intelligence. xAI 2023. Volume 1901. Cham: Springer. 2023. S. 456-479.
https://doi.org/10.1007/978-3-031-44064-9_24
Hiabu M, Meyer JT, Wright MN. Unifying local and global model explanations by functional decomposition of low dimensional structures. In: Ruiz F, Dy J, van de Meent J-W, editors. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. Volume 206. 2023.
https://proceedings.mlr.press/v206/hiabu23a/hiabu23a.pdf
Koenen N, Wright MN, Maass P, Behrmann J. Generalization of the change of variables formula with applications to residual flows. Thirty-eighth international conference on machine learning (ICML) workshop on invertible neural networks, normalizing flows, and explicit likelihood models. 2021.
https://openreview.net/forum?id=msCiI5dejr

Articles with peer review

Blesch K, Watson DS, Wright MN. Conditional feature importance for mixed data. AStA Advances in Statistical Analysis. 2024;108(2):259-278.
https://doi.org/10.1007/s10182-023-00477-9
Spytek M, Krzyzinski M, Langbein S, Baniecki H, Wright MN, Biecek P. survex: An R package for explaining machine learning survival models. Bioinformatics. 2023;39(12):btad723.
https://doi.org/10.1093/bioinformatics/btad723
Mehlig K, Foraita R, Nagrani R, Wright MN, De Henauw S, Molnár D, Moreno LA, Russo P, Tornaritis M, Veidebaum T, Lissner L, Kaprio J, Pigeot I, on behalf of the IDEFICS and I.Family consortia. Genetic associations vary across the spectrum of fasting serum insulin: Results from the European IDEFICS/I.Family children's cohort. Diabetologia. 2023;66(10):1914-1924.
https://doi.org/10.1007/s00125-023-05957-w
Bonannella C, Hengl T, Heisig J, Parente L, Wright MN, Herold M, de Bruin S. Forest tree species distribution for Europe 2000-2020: Mapping potential and realized distributions using spatiotemporal machine learning. PeerJ. 2022;10:e13728.
https://doi.org/10.7717/peerj.13728
Blesch K, Hauser OP, Jachimowicz JM. Measuring inequality beyond the Gini coefficient may clarify conflicting findings. Nature Human Behaviour. 2022;6:1525-1536.
https://doi.org/10.1038/s41562-022-01430-7
http://hdl.handle.net/10871/130494
Hüls A, Wright MN, Bogl L-H, Kaprio J, Lissner L, Molnár D, Moreno LA, De Henauw S, Siani A, Veidebaum T, Ahrens W, Pigeot I, Foraita R. Polygenic risk for obesity and its interaction with lifestyle and sociodemographic factors in European children and adolescents. International Journal of Obesity. 2021;45(6):1321-1330.
https://dx.doi.org/10.1038/s41366-021-00795-5
Wright MN, Kusumastuti S, Mortensen LH, Westendorp R, Gerds T. Personalised need of care in an ageing society: The making of a prediction tool based on register data. Journal of the Royal Statistical Society. Series A (Statistics in Society). 2021;184(4):1199-1219.
https://doi.org/10.1111/rssa.12644
Askland KD, Strong D, Wright MN, Moore JH. The translational machine: A novel machine-learning approach to illuminate complex genetic architectures. Genetic Epidemiology. 2021;45(5):485-536.
https://dx.doi.org/10.1002/gepi.22383
Watson DS, Wright MN. Testing conditional independence in supervised learning algorithms. Machine Learning. 2021;110(8):2107-2129.
https://doi.org/10.1007/s10994-021-06030-6
Breau B, Brandes B, Wright MN, Buck C, Vallis LA, Brandes M. Association of individual motor abilities and accelerometer-derived physical activity measures in preschool-aged children. Journal for the Measurement of Physical Behaviour. 2021;4(3):227-235.
https://doi.org/10.1123/jmpb.2020-0065
https://repository.publisso.de/resource/frl:6428750
Weinhold L, Schmid M, Mitchell R, Maloney KO, Wright MN, Berger M. A random forest approach for bounded outcome variables. Journal of Computational and Graphical Statistics. 2020;29(3):639-658.
https://doi.org/10.1080/10618600.2019.1705310
Brandes B, Buck C, Wright MN, Pischke CR, Brandes M. Impact of "JolinchenKids - fit and healthy in daycare" on children's objectively measured physical activity: A cluster-controlled study. Journal of Physical Activity & Health. 2020;17(10):1025-1033.
https://doi.org/10.1123/jpah.2019-0536
https://repository.publisso.de/resource/frl%3A6422936
Schmid M, Welchowski T, Wright MN, Berger M. Discrete-time survival forests with Hellinger distance decision trees. Data Mining and Knowledge Discovery. 2020;34(3):812-832.
https://doi.org/10.1007/s10618-020-00682-z
Boulesteix A-L, Wright MN, Hoffmann S, König IR. Statistical learning approaches in the genetic epidemiology of complex diseases. Human Genetics. 2020;139(1):73-84.
https://doi.org/10.1007/s00439-019-01996-9
Hornung R, Wright MN. Block forests: Random forests for blocks of clinical and omics covariate data. BMC Bioinformatics. 2019;20:358.
https://doi.org/10.1186/s12859-019-2942-y
Wright MN, König IR. Splitting on categorical predictors in random forests. PeerJ. 2019;7:e6339.
https://doi.org/10.7717/peerj.6339
Steenbock B, Wright MN, Wirsik N, Brandes M. Accelerometry-based prediction of energy expenditure in preschoolers. Journal for the Measurement of Physical Behaviour. 2019;2(2):94-102.
https://doi.org/10.1123/jmpb.2018-0032
https://repository.publisso.de/resource/frl%3A6426272
Probst P, Wright MN, Boulesteix A-L. Hyperparameters and tuning strategies for random forest. Data Mining and Knowledge Discovery. 2019;9(3):e1301.
https://doi.org/10.1002/widm.1301
Hirose M, Schilf P, Gupta Y, Zarse K, Künstner A, Fähnrich A, Busch H, Yin J, Wright MN, Ziegler A, Vallier M, Belheouane M, Baines JF, Tautz D, Johann K, Oelkrug R, Mittag J, Lehnert H, Othman A, Jöhren O, Schwaninger M, Prehn C, Adamski J, Shima K, Rupp J, Haesler R, Fuellen G, Köhling R, Ristow M, Ibrahim SM. Low-level mitochondrial heteroplasmy modulates DNA replication, glucose metabolism and lifespan in mice. Scientific Reports. 2018;8:5872.
https://doi.org/10.1038/s41598-018-24290-6
Nembrini S, König I, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34(21):3711-3718.
https://doi.org/10.1093/bioinformatics/bty373
Foraita R, Dijkstra L, Falkenberg F, Garling M, Linder R, Pflock R, Rizkallah Issak MR, Schwaninger M, Wright MN, Pigeot I. Aufdeckung von Arzneimittelrisiken nach der Zulassung: Methodenentwicklung zur Nutzung von Routinedaten der gesetzlichen Krankenversicherungen. Bundesgesundheitsblatt, Gesundheitsforschung, Gesundheitsschutz. 2018;61(9):1075-1081.
https://doi.org/10.1007/s00103-018-2786-z
https://repository.publisso.de/resource/frl%3A6421679
Fouodo CJK, König I, Weihs C, Ziegler A, Wright MN. Support vector machines for survival analysis with R. The R Journal. 2018;10(1):412-423.
https://doi.org/10.32614/RJ-2018-005
Hengl T, Nussbaum M, Wright MN, Heuvelink GB, Gräler B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ. 2018;6:e5518.
https://doi.org/10.7717/peerj.5518

Editorials

Boulesteix A-L, Wright MN. Special issue: Artificial intelligence in genomics. Human Genetics. 2022;141(9):1449-1450.
https://doi.org/10.1007/s00439-022-02472-7

Book chapters

Dandl S, Biecek P, Casalicchio G, Wright MN. Model interpretation. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 259-282.
https://mlr3book.mlr-org.com/chapters/chapter12/model_interpretation.html
Wright MN. Feature selection. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 146-160.
https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html
Casalicchio G, Burk L. Evaluation and benchmarking. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 53-82.
https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html
Binder M, Pfisterer F, Becker M, Wright MN. Non-sequential pipelines and tuning. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 174-195.
https://mlr3book.mlr-org.com/chapters/chapter8/non-sequential_pipelines_and_tuning.html
Pigeot I, Fröhlich H, Intemann T, Prause G, Wright MN. KI und die Nationale Forschungsdateninfrastruktur für personenbezogene Gesundheitsdaten (NFDI4Health). In: Dössel O, Schäffter T, Rutert B, editors. Künstliche Intelligenz in der Medizin. Berlin: Berlin-Brandenburgische Akademie der Wissenschaften. 2023. S. 62-74.
https://nbn-resolving.org/urn:nbn:de:kobv:b4-opus4-37962

Presentations at scientific meetings/conferences

Golchian P, Kapar J, Blesch K, Watson DS, Wright MN. Adversial random forests for imputing missing values. 70th Biometric Colloquium, 28 February-1 March 2024, Lübeck.
Burk L, Zobolas J, Bischl B, Bender A, Lang M, Wright MN, Sonabend R. A large-scale neutral comparison study of survival models. 70th Biometric Colloquium, 28 February-1 March 2024, Lübeck.
Koenen N, Wright MN. Interpreting neural networks: A biostatistical perspective. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
Langbein S, Wright MN. Interpretable machine learning for survival analysis. Survival Analysis for Junior Researchers (SAfJR), 13-15 September 2023, Günzburg.
Kapar J, Günther K, Watson DS, Wright MN. Generative modeling of epidemiological data using adversarial random forests. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
Burk L, Bender A, Wright MN. High-dimensional variable selection for competing risks with cooperative penalized regression. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
Wright MN, Blesch K, Watson DS. Testing conditional independence in supervised learning algorithms with the cpi package. The R User Conference "UseR!," 20-23 June 2022, online presentation.
Blesch K, Watson DS, Wright MN. Conditional variable importance for mixed data. 6. Konferenz der Deutschen Arbeitsgemeinschaft Statistik (DAGStat), 28. März-1. April 2022, Hamburg.
Koenen N, Wright MN. Interpreting deep neural networks with the R package innsight. 6. Konferenz der Deutschen Arbeitsgemeinschaft Statistik (DAGStat), 28. März-1. April 2022, Hamburg.
Koenen N, Wright MN. Interpreting deep neural networks with the R package innsight. The R User Conference "UseR!," 20-23 June 2022, online presentation.
Wright MN. Genome-wide conditional independence testing with machine learning. 67. Biometrisches Kolloquium der Deutschen Region der Internationalen Biometrischen Gesellschaft (IBS-DR), 14.-17. März 2021, Online-Vortrag.
Wright MN, Mortensen LH, Kusumastuti S, Westendorp R, Gerds T. Recurrent neural networks for time to event predictions with competing risks. 5th Conference of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat), 18-22 March 2019, Munich.
Hornung R, Wright MN. Block forests: Random forests for blocks of clinical and omics covariate data. 5th Conference of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat), 18-22 March 2019, Munich.

Presentations at scientific meetings/conferences (invited)

Wright MN. Interpretable machine learning. Begegnungszone: Statistical Physics and Machine Learning, 18.-21. September 2023, Leipzig.
Blesch K, Wright MN, Watson DS. Unfooling SHAP and SAGE: Knockoff imputation for Shapley values. 2nd TRR 318 Conference "Measuring Understanding," 6-7 November 2023, Paderborn.
Wright MN. From explainable AI to generative modeling with tree-based machine learning. Statistics and Econometrics Seminar, Humboldt-Universität zu Berlin, 17 October 2023, Berlin.
Kapar J, Wright MN, Vallis LA. Generative modelling of Guelph Family Health Study data. Guelph Family Health Study Meeting, 15 November 2023, Guelph, Canada.
Langbein S, Wright MN. Interpretable machine learning for survival analysis. 11th Autumn Workshop of the DGEpi (German Society for Epidemiology), GMDS (German Association for Medical Informatics, Biometry and Epidemiology), IBS-DR (German Region of the International Biometric Society) and DGSMP (German Society for Social Medicine and Prevention), 9-10 November 2023, Mainz.
Blesch K, Watson DS, Wright MN. Conditional feature importance for mixed data. Seminar in Econometrics, 2 May 2023, Cologne.
Kapar J, Watson DS, Wright MN. Generative modeling of mixed tabular data using adversarial random forests. Machine Learning Research Group Talks and Brainstorms, 12 October 2023, Guelph, Canada.
Langbein S. Interpretable machine learning for survival analysis. DFG Sino-German collaboration workshop on statistical methods on lifestyle intervention studies and sharing data, 20 October 2023, Beijing, China.
Wright MN. Random forests on high-dimensional data: From classification and survival analysis to generative modelling. Seminar des Graduiertenkollegs 2624 der Technischen Universität Dortmund, 4. Juli 2022, Dortmund.
Wright MN. Interpretable machine learning. Interpretable Machine Learning Workshop with the School of Statistics and Actuarial Science, University of the Witwatersrand, 19-20 September 2022, Johannesburg, South Africa.
Wright MN. Random forests: Myths and facts. 52nd Workshop Statistical Computing, 24-27 July 2022, Günzburg.
Wright MN. Genome-wide interpretable machine learning. Seminar Series of the Charles Bronfman Institute for Personalized Medicine at the Icahn School of Medicine at Mount Sinai, 8 December 2021, online presentation.
Wright MN. Machine learning for survival data. Seminar des Instituts für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), Universitätsmedizin der Johannes Gutenberg-Universität Mainz, 10. Juni 2021, Mainz.
Wright MN. Interpretable machine learning in genetics. XXXIInd Conference of the Austro-Swiss Region (ROeS) of the International Biometric Society, 9 September 2021, Salzburg, Austria.
Wright MN. Model-agnostic interpretable machine learning. MOOD (MOnitoring Outbreaks for Disease surveillance in a data science context) Webinar, 30 June 2021, online presentation.
Wright MN. Machine learning for time to event data. Expertenvortrag im Workshop des Projekts "ARTEMIS - Künstliche Intelligenz bei muskuloskelettalen Erkrankungen", 24. September 2021, Online-Vortrag.
Wright MN. Random forests: Myths and facts. Kolloquium "Statistische Methoden in der empirischen Forschung," 23. November 2021, Online-Vortrag.
Wright MN. Machine learning for survival data. MeVis Online Academy, 21 February 2020, Bremen.
Wright MN. Interpretable machine learning for time to event data. Seminar des Instituts für Biometrie und Klinische Epidemiologie der Charité - Universitätsmedizin Berlin, 20. Januar 2020, Berlin.
Wright MN. Benchmarking machine learning algorithms. "Fraunhofer Academy" des Fraunhofer-Instituts für Digitale Medizin MEVIS, 15. November 2019, Bremen.
Wright MN, Kusumastuti S. Predicting the personalized need of care in an ageing society. Epidemiological Seminar Series, 28 November 2019, Copenhagen, Denmark.
Wright MN. Random forests: The first-choice method for every data analysis? Why R? 2019 Conference, 26-29 September 2019, Warsaw, Poland.

Posters at scientific meetings/conferences

Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. 26th International Conference on Artificial Intelligence and Statistics (AISTATS), 25-27 April 2023, Valenica, Spain.
Koenen N, Wright MN, Maass P, Behrmann J. Generalization of the change of variables formula with applications to residual flows. 38th International Conference on Machine Learning (ICML) Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+), 23 July 2021, online poster.
Wright MN. Splitting on categorical predictors in random forests. 64. Biometrisches Kolloquium, 25.-28. März 2018, Frankfurt.

Software

Koenen N, Baudeu R. innsight: Get the insights of your neural network. (Version 0.2.0); 2023.
https://github.com/bips-hb/innsight
Wright MN, Wager S, Probst P. ranger: A fast implementation of random forests. (Version 0.16.0); 2023.
https://cran.r-project.org/package=ranger
Blesch K, Wright MN. arfpy. (Version 0.1.1); 2023.
https://github.com/bips-hb/arfpy
Krzyzinski M, Spytek M, Baniecki H, Biecek P, Gosiewska A, Langbein S. survex: Explainable machine learning in survival analysis. (Version 2.0); 2023.
https://github.com/ModelOriented/survex
Wright MN, Watson DS. arf: Adversarial random forests. (Version 0.1.3); 2023.
https://cran.r-project.org/package=arf
Wright MN, Watson DS. cpi: Conditional predictive impact. (Version 0.1.4); 2022.
https://cran.r-project.org/package=cpi
Hornung R, Wright MN. blockForest: Block forests: Random forests for blocks of clinical and omics covariate data. (Version 0.2.5); 2022.
https://cran.r-project.org/package=blockForest
Wright MN. survnet: Artificial neural networks for survival analysis. (Version 0.0.5); 2019.
https://github.com/bips-hb/survnet
Fritsch S, Günther F, Wright MN, Suling M, Mueller SM. neuralnet: Training of neural networks. (Version 1.44.2); 2019.
https://CRAN.R-project.org/package=neuralnet

Current Projects

Only currently running projects or those where publications are still in preparation or those that ended less than a year ago will be shown. The entries are sorted alphabetically.

Staff

Burk, Lukas
Tel.: +49 (0)421 218-56955
Fax: +49 (0)421 218-56941
burk(at)leibniz-bips.de

Golchian, Pegah
Tel.: +49 (0)421 218-56790
golchian(at)leibniz-bips.de

Herbinger, Julia, Dr.
herbinger(at)leibniz-bips.de

Kapar, Jan
Tel.: +49 (0)421 218-56929
Fax: +49 (0)421 218-56941
kapar(at)leibniz-bips.de

Koenen, Niklas
Tel.: +49 (0)421 218-56933
Fax: +49 (0)421 218-5641
koenen(at)leibniz-bips.de

Langbein, Sophie
Tel.: +49 (0)421 218-56886
langbein(at)leibniz-bips.de

Wright, Marvin N., Prof. Dr.
Tel.: +49 (0)421 218-56945
Fax: +49 (0)421 218-56941
wright(at)leibniz-bips.de

Marvin N. Wright

Contact:
Prof. Dr. Marvin N. Wright
Tel: +49 (0)421 218-56945
Fax: +49 (0)421 218-56941

Email:
wright(at)leibniz-bips.de