Target group/prerequisites

Participants are expected to have knowledge of basic statistics, e.g. hypothesis testing, correlation and linear regression, and experience using R and RStudio.

Course design

Each day consists of lectures and practicals using R. Examples of datasets delivered by the course participants in advance will be used in the lectures and practicals when possible.

Program topics

Day 1: Data pre-treatment, PCA, short review multiple linear regression and PCR.
Discussion of different data pre-treatment methods e.g. centering, autoscaling, pareto scaling, range scaling, log transformation, and data exploration using Principal Component Analysis, PCA, and regression using the principal components from PCA in Principal Component Regression, PCR.
Day 2: Advanced regression techniques and Model validation.
Discussion of Partial Least Squares, PLS, a technique similar to PCR but with improvements and regularized regression e.g. ridge/lasso, together with way of assessing model accuracy.
Day 3: Clustering and classification; k-means, hierarchical clustering
Discussion of cluster analysis: choice of similarity measure, agglomerative methods, divisive methods, k-means & hierarchical clustering.

Date & duration:

The course will be held on: Wednesday 6, Thursday 7 and Friday 8 June 2018

Study load:

The study load of this course is 1.5 ECTS credits.


Forum building, details will be announced later.


€ 300 including handouts, coffee/tea during the breaks and lunches