Beyond prediction: Statistical inference with machine learning

Description

Emmy Noether Junior Research Group: Beyond Prediction - Statistical Inference with Machine Learning

The junior research group focusses on the development of statistical inference methods for machine learning methods. We put special emphasis on problems faced in epidemiology such as confounding, high-dimensional data and survival outcomes. The project is of methodological nature but with a strong focus on applications. Our methods are publicly available as software packages, ready to be used by practitioners and applied researchers.

Major research interests oft he research group are:
  • Interpretable machine learning
  • Statistical properties of machine learning methods
  • Survival analysis
  • Statistical software
  • Application to high dimensional data

The group is funded by the Emmy Noether programme of the German Research Foundation (DFG) and headed by Marvin N. Wright.

Funding period

Begin:   May 2020
End:   December 2026

Sponsor

  • German Research Foundation

Contact

Prof. Dr. Marvin N. Wright

Selected project-related publications

    Articles with peer-review

  • Blesch K, Watson DS, Wright MN. Conditional feature importance for mixed data. AStA Advances in Statistical Analysis. 2024;108(2):259-278.
    https://doi.org/10.1007/s10182-023-00477-9
  • Askland KD, Strong D, Wright MN, Moore JH. The translational machine: A novel machine-learning approach to illuminate complex genetic architectures. Genetic Epidemiology. 2021;45(5):485-536.
    https://dx.doi.org/10.1002/gepi.22383
  • Watson DS, Wright MN. Testing conditional independence in supervised learning algorithms. Machine Learning. 2021;110(8):2107-2129.
    https://doi.org/10.1007/s10994-021-06030-6
  • Editorials

  • Boulesteix A-L, Wright MN. Special issue: Artificial intelligence in genomics. Human Genetics. 2022;141(9):1449-1450.
    https://doi.org/10.1007/s00439-022-02472-7
  • Contributions to books and proceedings

  • Binder M, Pfisterer F, Becker M, Wright MN. Non-sequential pipelines and tuning. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 174-195
    https://mlr3book.mlr-org.com/chapters/chapter8/non-sequential_pipelines_and_tuning.html
  • Casalicchio G, Burk L. Evaluation and benchmarking. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 53-82
    https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html
  • Dandl S, Biecek P, Casalicchio G, Wright MN. Model interpretation. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 259-282
    https://mlr3book.mlr-org.com/chapters/chapter12/model_interpretation.html
  • Wright MN. Feature selection. In: Bischl B, Sonabend R, Kotthoff L, Lang M, editors. Applied machine learning using mlr3 in R. Boca Raton: CRC Press. 2024. S. 146-160
    https://mlr3book.mlr-org.com/chapters/chapter6/feature_selection.html
  • Presentations at scientific meetings/conferences (invited)

  • Blesch K, Watson DS, Wright MN. Conditional feature importance for mixed data. Seminar in Econometrics, 2 May 2023, Cologne.
  • Blesch K, Wright MN, Watson DS. Unfooling SHAP and SAGE: Knockoff imputation for Shapley values. 2nd TRR 318 Conference "Measuring Understanding," 6-7 November 2023, Paderborn.
  • Swenne A, Wright MN. Confounder adjustment with random forests based on local residuals in genetic association studies. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
  • Wright MN. From explainable AI to generative modeling with tree-based machine learning. Statistics and Econometrics Seminar, Humboldt-Universität zu Berlin, 17 October 2023, Berlin.
  • Wright MN. Interpretable machine learning. Begegnungszone: Statistical Physics and Machine Learning, 18.-21. September 2023, Leipzig.
  • Wright MN. Random forests on high-dimensional data: From classification and survival analysis to generative modelling. Seminar des Graduiertenkollegs 2624 der Technischen Universität Dortmund, 4. Juli 2022, Dortmund.
  • Wright MN. Random forests: Myths and facts. 52nd Workshop Statistical Computing, 24-27 July 2022, Günzburg.
  • Wright MN. Interpretable machine learning. Interpretable Machine Learning Workshop with the School of Statistics and Actuarial Science, University of the Witwatersrand, 19-20 September 2022, Johannesburg, South Africa.
  • Wright MN. Machine learning for survival data. Seminar des Instituts für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), Universitätsmedizin der Johannes Gutenberg-Universität Mainz, 10. Juni 2021, Mainz.
  • Wright MN. Model-agnostic interpretable machine learning. MOOD (MOnitoring Outbreaks for Disease surveillance in a data science context) Webinar, 30 June 2021, online presentation.
  • Wright MN. Interpretable machine learning in genetics. XXXIInd Conference of the Austro-Swiss Region (ROeS) of the International Biometric Society, 9 September 2021, Salzburg, Austria.
  • Wright MN. Random forests: Myths and facts. Kolloquium "Statistische Methoden in der empirischen Forschung," 23. November 2021, Online-Vortrag.
  • Wright MN. Machine learning for time to event data. Expertenvortrag im Workshop des Projekts "ARTEMIS - Künstliche Intelligenz bei muskuloskelettalen Erkrankungen", 24. September 2021, Online-Vortrag.
  • Wright MN. Genome-wide interpretable machine learning. Seminar Series of the Charles Bronfman Institute for Personalized Medicine at the Icahn School of Medicine at Mount Sinai, 8 December 2021, online presentation.
  • Presentations at scientific meetings/conferences

  • Burk L, Zobolas J, Bischl B, Bender A, Lang M, Wright MN, Sonabend R. A large-scale neutral comparison study of survival models. 70th Biometric Colloquium, 28 February-1 March 2024, Lübeck.
  • Golchian P, Kapar J, Blesch K, Watson DS, Wright MN. Adversial random forests for imputing missing values. 70th Biometric Colloquium, 28 February-1 March 2024, Lübeck.
  • Burk L, Bender A, Wright MN. High-dimensional variable selection for competing risks with cooperative penalized regression. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
  • Koenen N, Wright MN. Interpreting neural networks: A biostatistical perspective. 5th Conference of the Central European Network (CEN), 3-7 September 2023, Basel, Switzerland.
  • Blesch K, Watson DS, Wright MN. Conditional variable importance for mixed data. 6. Konferenz der Deutschen Arbeitsgemeinschaft Statistik (DAGStat), 28. März-1. April 2022, Hamburg.
  • Koenen N, Wright MN. Interpreting deep neural networks with the R package innsight. 6. Konferenz der Deutschen Arbeitsgemeinschaft Statistik (DAGStat), 28. März-1. April 2022, Hamburg.
  • Koenen N, Wright MN. Interpreting deep neural networks with the R package innsight. The R User Conference "UseR!," 20-23 June 2022, online presentation.
  • Wright MN, Blesch K, Watson DS. Testing conditional independence in supervised learning algorithms with the cpi package. The R User Conference "UseR!," 20-23 June 2022, online presentation.
  • Wright MN. Genome-wide conditional independence testing with machine learning. 67. Biometrisches Kolloquium der Deutschen Region der Internationalen Biometrischen Gesellschaft (IBS-DR), 14.-17. März 2021, Online-Vortrag.
  • Posters at scientific meetings/conferences

  • Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. 26th International Conference on Artificial Intelligence and Statistics (AISTATS), 25-27 April 2023, Valenica, Spain.
  • Software

  • Blesch K, Wright MN. arfpy. (Version 0.1.1); 2023.
    https://github.com/bips-hb/arfpy
  • Koenen N, Baudeu R. innsight: Get the insights of your neural network. (Version 0.2.0); 2023.
    https://github.com/bips-hb/innsight
  • Wright MN, Wager S, Probst P. ranger: A fast implementation of random forests. (Version 0.16.0); 2023.
    https://cran.r-project.org/package=ranger
  • Wright MN, Watson DS. arf: Adversarial random forests. (Version 0.1.3); 2023.
    https://cran.r-project.org/package=arf
  • Koenen N, Baudeu R. innsight: Get the insights of your neural network. (Version 0.1.1); 2022.
    https://cran.r-project.org/package=innsight
  • Wright MN, Watson DS. cpi: Conditional predictive impact. (Version 0.1.4); 2022.
    https://cran.r-project.org/package=cpi
  • Wright MN, Watson DS. arf: Adversarial random forests. (Version 0.1.2); 2022.
    https://cran.r-project.org/package=arf
  • Koenen N, Baudeu R. innsight: Get the insights of your neural network. (Version 0.1.0); 2021.
    https://cran.r-project.org/package=innsight