Course
Big Data Analysis in the Life Sciences
Technological and scientific advances have pushed the Life Sciences into the BIG data era. The increased volume and heterogeneity of available data challenges scientists to master techniques for comprehensive data analysis. Extracting meaningful (biological) information from large datasets is increasingly becoming a challenge in all fields of life sciences. Thus, the ability to select and deploy analysis tools and algorithms has become an indispensable skill for all researchers. In this course, we aim to introduce participants to techniques for comprehensive data analysis of large, heterogeneous datasets and extract relevant information for elucidating biological design principles. The course is modular and focuses on data generation, mining, analysis, data integration, and visualization.

Background
Technological and scientific advances have pushed the Life Sciences into the BIG data era. The increased volume and heterogeneity of available data challenges scientists to master techniques for comprehensive data analysis. Extracting meaningful (biological) information from large datasets is increasingly becoming a challenge in all fields of life sciences. Thus, the ability to select and deploy analysis tools and algorithms has become an indispensable skill for all researchers. In this course, we aim to introduce participants to techniques for comprehensive data analysis of large, heterogeneous datasets and extract relevant information for elucidating biological design principles. The course is modular and focuses on data generation, mining, analysis, data integration, and visualization.
Target group
This course is aimed at anyone working with big datasets in the life sciences who has an interest in learning more about tools and possibilities for big data analysis. The course is introductory.
Required knowledge: Basic knowledge of statistics.
Prerequisite: Experience in computer programming would be useful but is not mandatory. Computer practicals will require using R. An R tutorial will be sent in advance to ensure that all participants have the required starting level.
Course aim
The aim of this course is to introduce participants to techniques for comprehensive data analysis of big data and to integrate heterogeneous data sets in order to extract relevant information for elucidating the living system. The course is modular and focuses on data generation, mining, analysis, data integration, and visualization.
Preliminary Programme topics
Day 1 - Monday 20 October
- Linked data retrieval: querying biological resources usingRDF andSPARQL.
- Brief introduction to big data theoretical concepts: volume, variability, veracity, velocity.
- Introduction to the Resource Description Framework (RDF) data format: triples, data types and objects.
- Federated queries: advanced mining of (multiple) biological and biochemical database (such as UniProt, Reactome or EBI) using SPARQL.
- Data reduction using multivariate statistics: PCA, theory and practical aspects.
- Variance-Covariance matrix and correlations.
- Data decomposition and PCA solution.
- Methods for dimensionality assessment: statistical tests and computational approaches (permutation and cross-validation).
- Interpretation of PCA model: loadings, biplots and their limitations.
- Brief introduction to dimensionality reduction via t-SNE and UMAP.
Day 2 - Tuesday 21 October:
- Sparse approaches to PCA: sparse PCA, groupwise PCA.
- Introduction to PLS regression.
- Collinearity.
- Prediction and classification with PLS: discriminant analysis overfitting, cross-validation, and model optimization.
- Sparse approaches to PLS.
- Multilevel data integration using PLS and sparse PLS for different data types.
- Network reconstruction: Associations networks based on information theoretical methods and correlation analysis
Day 3 - Wednesday 22 October:
- Data visualization
- Introduction to Cytoscape and R to visualize omics data such as expression data
- Speaker seminar (TBA)
- bring your own data workshop. Apply the knowledge learnt on your case study.
Day 4 - Thursday 23 October:
- Brief introduction to Bayesian networks
- R tools and practical on network reconstruction exploration
- Concluding remarks
Lecturers
Lecturers are from the Laboratory of Systems & Synthetic Biology (SSB) and from the Division of Human Nutrition and Health (HNH) from Wageningen University & Research:
- Jasper Koehorst, PhD (SSB)
- Edoardo Saccenti, PhD (SSB)
- Cristina Furlan, PhD (SSB)
- Maria Suarez Diez, PhD (SSB)
- Guido Hooiveld, PhD (HNH)
Dr Jasper Koehorst has ample experience on infrastructure development for big data applications for bacterial genomics. He is an expert on semantic data integration and on the use of semantic resources in the life sciences
Dr Edoardo Saccenti is an expert on multivariate data analysis. His research focuses on reduction and modelling techniques for large biological data sets using random matrix theory and (sparse) component approaches.
Dr Cristina Furlan has extensive experience in the use, analysis, and integration of omics data with a particular interest in proteomics.She is a specialist in the biological interpretation of data sets related to biomedical topics.
Prof. Maria Suarez-Diez has extensive experience in reverse engineering of regulatory networks, metabolic modelling and the combination thereof to gain systems level understanding of the living system. She has also extensive experience in the integration of heterogeneous data sets and in the use of semantic web technologies for data integration and sharing in the life sciences.
Dr Guido Hooiveld has extensive experience in the use of high throughput, information-dense omics technologies in combination with experiments in suitable models (e.g. cell, animal, and human models). He is an expert in the biological interpretation of complex nutrigenomics datasets and on integration of multi-omics data sets with functional measurements to extract novel biological insights into the molecular effects of nutrients.
Date & duration
The course will be 3.5 days, from 20 - 23 October 2025
Study load
The study load of this course is 1 ECTS credits.
Language
The course language will be English.
Contact information
For more information on the course content please contact Dr Cristina Furlan (cristina.furlan@wur.nl) or Dr Edoardo Saccenti (edoardo.saccenti@wur.nl)
For organisational matters please contact Cornelia van Bree-Evers (cornelia.vanbree-evers@wur.nl).
Location & accommodation
The course venue is one of the Wageningen University buildings on Wageningen Campus, Wageningen, the Netherlands.
The town of Wageningen is 5 km from Ede-Wageningen railway station, with transport options being taxi or bus. Ede-Wageningen railway station is about one and a half hours from Amsterdam Schiphol Airport. For train schedules visit: www.ns.nl.
Hotel accommodation is not included in the course fee. If you need accommodation in Wageningen, you can find several options below.
Hotels:
Wageningen International Congress Centre
Other accomodation
Registration & course fee
The number of participants is limited to 25, when we exceed the maximum number of participants, new registrations will be placed on the waiting list. Early bird registration deadline is 22 August 2025. The final registration date is 19 September 2025.
The course fee (which includes materials, coffee/tea during breaks, lunches and one dinner but does not cover accommodation) depends on the participant's affiliation:
Affiliation | Early Bird Fee | Regular Fee |
WUR PhD candidates* | € 250 | € 300 |
All other PhD candidates / Postdocs and staff from VLAG | € 475 | € 525 |
Other University staff / Non -Profit | € 675 | € 725 |
Industry / For-Profit | € 1400 | € 1450 |
* WUR PhD candidates affiliated with VLAG, EPS, WIMEK, WASS, WIAS, PE&RC
Applicants will be informed of acceptance of their registration if possible within two weeks after their registration, but at latest a day after the final registration date. They will then receive instructions for payment, a notice of acceptance and further course details.
Cancellation policy
After acceptance of your registration, the VLAG Cancellation Conditions for course participants will apply.