Course

Big Data Analysis in the Life Sciences

Technological and scientific advances have pushed the Life sciences into the BIG data era. The increased volume and heterogeneity of available data challenges scientists to master techniques for comprehensive data analysis. Extracting meaningful (biological) information from large datasets is increasingly becoming a challenge in all fields of life sciences. Thus, the ability to select and deploy analysis tools and algorithms has become an indispensable skill for all researchers. In this course, we aim to introduce participants to techniques for comprehensive data analysis of large, heterogeneous datasets and extract relevant information for elucidating biological design principles. The course is modular and focuses on data generation, mining, analysis, data integration, and visualization.

Organised by	the Graduate School VLAG, in co-operation with Systems and Synthetic Biology (Wageningen University) and Human Nutrition and Health (Wageningen University)
Venue	Wageningen University, The Netherlands

Background

Target group

This course is aimed at anyone working with big datasets in the life sciences who has an interest in learning more about tools and possibilities for big data analysis. The course is introductory.

Required knowledge: Basic knowledge of statistics.

Prerequisite: Experience of computer programming would be useful but is not mandatory. Computer practicals will require using R. An R tutorial will be sent in advance to ensure that all participants have the required level.

Course aim

The aim of this course is to introduce participants to techniques for comprehensive data analysis of big data and to integrate heterogeneous data sets in order to extract relevant information for elucidating the living system. The course is modular and focuses on data generation, mining, analysis, data integration, and visualization.

Programme topics

Day 1 - Monday 21 October

Linked data retrieval: querying biological resources using RDF and SPARQL.
Brief introduction to Big data theoretical concepts: volume, variability, veracity, velocity
Introduction to the Resource Description Framework (RDF) data format: triples, data types and objects
Practical: Federated queries: advanced mining of (multiple) biological and biochemical database (such as UniProt, Reactome or EBI) using SPARQL
Data reduction using multivariate statistics: PCA, theory and practical aspects
Variance-Covariance matrix and correlations
Data decomposition and PCA solution
Methods for dimensionality assessment (statistical tests and computational approaches (permutation and cross-validation)
Interpretation of PCA model: loading and biplots and limitations

Day 2 - Tuesday 22 October:

Sparse approaches to PCA: sparse PCA, groupwise PCA
Introduction to PLS regression
Collinearity
Prediction and classification with PLS: Discriminant analysis Overfitting, Cross-validation and model optimization
Sparse approaches to PLS
Multilevel data integration using PLS and sparse PLS for different data types

Day 3 - Wednesday 23 October:

Network reconstruction: Associations networks based on information theoretical methods and correlation analysis
Brief introduction to Bayesian networks
R tools and practical on network reconstruction exploration
Data visualization
Introduction to Cytoscape and R to visualize omics data such as expression data

Lecturers

Lecturers are from the Laboratory of Systems & Synthetic Biology (SSB) and from the Division of Human Nutrition and Health (HNH) from Wageningen University & Research:

Edoardo Saccenti, PhD (SSB)

Maria Suarez Diez, PhD (SSB)

Jasper Koehorst, PhD (SSB)

Guido Hooiveld, PhD (HNH)

Dr Edoardo Saccenti is an expert on multivariate data analysis. His research focuses on reduction and modelling techniques for large biological data sets using random matrix theory and (sparse) component approaches.

Dr Maria Suarez-Diez has extensive experience in reverse engineering of regulatory networks, metabolic modelling and the combination thereof to gain systems level understanding of the living system. She has also extensive experience in the integration of heterogeneous data sets and in the use of semantic web technologies for data integration and sharing in the life sciences.

Dr Jasper Koehorst has ample experience on infrastructure development for big data applications for bacterial genomics. He is an expert on semantic data integration and on the use of semantic resources in the life sciences

Dr Guido Hooiveld has extensive experience in the use of high throughput, information-dense omics technologies in combination with experiments in suitable models (e.g. cell, animal, and human models). He is an expert in the biological interpretation of complex nutrigenomics datasets and on integration of multi-omics data sets with functional measurements to extract novel biological insights into the molecular effects of nutrients.

Date & duration

The course will be 3 days, from 21 - 23 October 2019.

Study load

The study load of this course is 0.9 ECTS credits.

Language

The course language will be English.

Contact information

For more information on the course content please contact Dr Maria Suarez Diez (maria.suarezdiez@wur.nl) or Dr Edoardo Saccenti (edoardo.saccenti@wur.nl)

For organisational matters please contact Cornelia van Bree-Evers (cornelia.vanbree-evers@wur.nl).

Location & accommodation

The course venue is Wageningen University, Wageningen, the Netherlands.

The town of Wageningen is 5 km from Ede-Wageningen railway station, with transport options being taxi or bus. Ede-Wageningen railway station is about one and a half hours from Amsterdam Schiphol Airport. For train schedules visit: www.ns.nl.

Hotel accommodation is not included in the course fee. However a number of hotel rooms have been blocked at the WICC. Accommodation costs are €69,- (single room, inlc. breakfast), €74,50 (twin room, single use incl. breakfast) per night. Participants have to book their own room by sending an e-mail to: info@wicc.nl. Please book your room before 1 September 2019 and mention booking code BDA19

Registration & course fee

THIS COURSE IS FULLY BOOKED!

Registration will be accepted in the order in which the registration form is received. You will be notified on acceptance by email. Then you will receive instructions for payment, and further course details.

The course fee (which includes materials, coffee/tea during breaks, lunches and one dinner but does not cover accommodation) depends on the participant's affiliation:

Course fee
PhD candidates affiliated with VLAG, PE&RC, EPS, WIAS, WASS, SENSE/WIMEK	€ 225
PhD candidates	€ 450
University staff / Non -Profit	€ 625
Industry / For-Profit	€ 1200

Cancellations policy

No charge until 1 October 2019
25% of the course fee paid or due till 14 October 2019
No refund after 14 October 2019

Substitutions for participants may be made at any time until the start of the course.

Download information

Big Data Analysis in the Life Sciences 2019.pdf (342,39 kb)