Course catalogue doctoral education - VT21

  • Ansökan kan ske mellan 2020-10-15 och 2020-11-16
Application closed
Title Multivariate prediction modelling with applications in precision medicine
Course number 2990
Programme Epidemiologi
Language English
Credits 1.5
Date 2020-11-23 -- 2020-11-27
Responsible KI department Institutionen för medicinsk epidemiologi och biostatistik
Specific entry requirements Epidemiology I, Introduction to epidemiology; Epidemiology II, Design of epidemiological studies; Biostatistics I, Introduction for epidemiologists; Biostatistics II, Logistic regression for epidemiologists; and Biostatistics III: Survival analysis for epidemiologists, or equivalent courses
Purpose of the course This course aims to provide an introduction to both supervised and unsupervised methodologies for prediction modelling with a focus on biomedical applications, molecular epidemiology and personalised medicine.
Intended learning outcomes After successfully completing this course you as a student are expected to be able to:
- Perform and assess basic quality control and outlier detection
- Apply unsupervised and supervised statistical learning methods to detect patterns in data
- Devise cross-validation strategies for parameter estimation, model selection and prediction performance evaluation
- Make informed judgement of how to apply basic principles for variable selection
- Critically evaluate prediction models in real-world applications
Contents of the course Personalised medicine is a cornerstone of tomorrows health care, and is based on the idea of stratifying patients into groups based on e.g. disease risk, prognosis or probability of treatment response and administrate the most suitable therapy for each individual. The capability to generate vast amounts of quantitative molecular data from DNA- and RNA-sequencing and other molecular profiling methods is providing unprecedented opportunity for implementation of personalized precision medicine approaches in the health care system. Molecular profiling typically generates data with tens of thousands of variables of which only a subset is relevant for treatment decisions. The promise of personalised medicine relies on our ability to turn the vast molecular datasets into clinically actionable predictive models of individualised therapy response. Application of statistical learning methods and prediction modelling is a central component in developing these models, and in developing the biomarker panels that can be used for molecular subtyping, risk stratification and prediction of treatment response. This course provides an introduction to statistical learning methods and prediction models that are relevant for personalised medicine with a focus on real-world applications.

This course aims to provide an introduction to methodologies for prediction modelling with a focus on biomedical applications, molecular epidemiology and personalised medicine. The course covers basic theory and introduction to modern statistical and machine learning methods for prediction modelling in high-dimensional data, together with applied data analysis through computer-based exercises. Lectures and exercises will cover the full process going from the initial data set and through data normalisation, quality control, outlier detection, application of unsupervised learning methods, application of supervised learning methods, variable selection, cross-validation and model evaluation. The main objective of the course is to provide basic theory and practical knowledge that will enable course participants to apply covered methodologies in their own research.

Topics covered include: data import and basic visualisation, data pre-processing, quality control and outlier detection, unsupervised learning, supervised learning, cross-validation for parameter estimation and estimation of prediction performance, variable selection, recently developed methods (e.g. deep learning, conformal prediction).

Teaching and learning activities The course is based on a combination of lectures, which covers methods and theory, together with computer-based exercises in R, where real-world data are analysed and interpreted. Previous experience from practical experience applying statistical models in a computer-based environment (e.g R, SAS, Stata, Matlab, Python) is strongly recommended.
Compulsory elements The individually written examination.
Examination The individual examination will be performed as a take-home examination. It consists of an individually written lab-report where results from an applied data analysis mini-project should be summarised and critically evaluated. Students who do not obtain a passing grade in the first examination will be offered a second examination within two months of the final day of the course.
Literature and other teaching material Suggested course literature:
Elements of Statistical Learning, Hastie, Tibshirani and Friedman (2009). Springer-Verlag,
Freely available at
Number of students 8 - 25
Selection of students Eligible doctoral students will be prioritized according to 1) the relevance of the course syllabus for the applicant’s doctoral project (according to written information), 2) date for registration as a doctoral student (priority given to earlier registration date). To be considered, submit a completed application form. Give all information requested, including a short description of current research training and motivation for attending, as well as an account of previous courses taken.
More information It is recommended to have taken an introductory course in R or to have equivalent experience prior to taking this course.
Additional course leader
Latest course evaluation Course evaluation report
Course responsible Mattias Rantalainen
Institutionen för medicinsk epidemiologi och biostatistik
Contact person Gunilla Nilsson Roos
Institutionen för medicinsk epidemiologi och biostatistik
08-524 822 93