Technological and scientific advances have pushed the Life sciences into the BIG data era. The increased volume and heterogeneity of available data challenges scientists to master techniques for comprehensive data analysis. Extracting meaningful (biological) information from large datasets is increasingly becoming a challenge in all fields of life sciences. Thus, the ability to select and deploy analysis tools and algorithms has become an indispensable skill for all researchers. In this course, we aim to introduce participants to techniques for comprehensive data analysis of large, heterogeneous datasets and extract relevant information for elucidating biological design principles. The course is modular and focuses on data generation, mining, analysis, data integration, and visualization.
This course is aimed at anyone working with big datasets in the life sciences who has an interest in learning more about tools and possibilities for big data analysis. The course is introductory and no specific prerequisites are asked.
Required knowledge: Basic knowledge of statistics.
Prerequisite: Experience of computer programming would be useful but is not mandatory. Computer practicals will require using R. An R tutorial will be sent in advance to ensure that all participants have the required level.
Max. 25 participants. Registrants will be placed on an waiting-list when the maximum is reached.
The aim of this course is to introduce participants to techniques for comprehensive data analysis of big data and to integrate heterogeneous data sets in order to extract relevant information for elucidating the living system. The course is modular and focuses on data generation, mining, analysis, data integration, and visualization.
The course will be held from Monday 15 to Friday 19th October 2018, morning and afternoon (tentative schedule: 9-17h)
Monday 15-10: Linked data retrieval: querying biological resource using RDF and SPARQL
Brief introduction to Big data theoretical concepts: volume, variability, veracity, velocity.
Introduction to the Resource Description Framework (RDF) data format: triples, data types and objects.
Practical: Federated queries: advanced mining of (multiple) biological and biochemical database (such as UniProt, Reactome or EBI) using SPARQL.
Tuesday 16-10: Data reduction using multivariate statistics: PCA, sparse methods and PLS regression
Introduction to multivariate statistics applied to Big datasets.
Practical (in R) : PCA, Sparse methods and PLS regression to combine multiple data types
Wednesday 17-10: Multilevel data integration, network reconstruction and network analysis
Network reconstruction: Associations networks based on information theoretical methods and correlation analysis.
Brief introduction to Bayesian networks.
Practical in R. Network mining and analysis.
R tools and practical on network exploration.
Thursday 18-10: Data visualization
Using Cytoscape and R to visualize omics data such as expression data.
Friday 19-10: Hands-on workshop on data analysis
bring your own data and questions
- Dr Maria Suarez Diez (Systems & Synthetic Biology, Wageningen University and Research)
- Dr Edoardo Saccenti (Systems & Synthetic Biology, Wageningen University and Research)
- Jasper Koehorst MSc (Systems & Synthetic Biology, Wageningen University and Research)
- Dr Guido Hooiveld (Division of Human Nutrition, Wageningen University and Research)
Dr. Edoardo Saccenti is an expert on multivariate data analysis. His research focuses on reduction and modelling techniques for large biological data sets using random matrix theory and (sparse) component approaches.
Dr. Maria Suarez-Diez has extensive experience in reverse engineering of regulatory networks, metabolic modelling and the combination thereof to gain systems level understanding of the living system. She has also extensive experience in the integration of heterogeneous data sets and in the use of semantic web technologies for data integration and sharing in the life sciences.
Jasper Koehorst has ample experience on infrastructure development for big data applications for bacterial genomics. He is an expert on semantic data integration and on the use of semantic resources in the life sciences
Dr. Guido Hooiveld has extensive experience in the use of high throughput, information-dense omics technologies in combination with experiments in suitable models (e.g. cell, animal, and human models). He is an expert in the biological interpretation of complex nutrigenomics datasets and on integration of multi-omics data sets with functional measurements to extract novel biological insights into the molecular effects of nutrients.
Date & duration
The course will be held from Monday 15 - Friday 19 October 2018.
The study load of this course is 1.5 ECTS credits.
The course language will be English.
For information about the course contents please contact Dr. Maria Suarez Diez (firstname.lastname@example.org ), Dr Edoardo Saccenti (email@example.com)
For organisational matters please contact Mrs. Cornelia van Bree-Evers:
Location & accommodation
The course venue is in the Forum Building (102) located at Wageningen Campus, Wageningen University and Research.
The town of Wageningen is 5 km from Ede-Wageningen railway station, with transport options being taxi or bus. Ede-Wageningen railway station is about one and a half hours from Amsterdam Schiphol Airport. For train schedules visit: www.ns.nl.
The organisation has blocked a number of hotel rooms at Wageningen International Congress Center (WICC) for course participants, but only until 15 September 2018. You can visit: www.wicc.nl for more information.
Participants have to book their own hotel room by sending an email: firstname.lastname@example.org and mention booking code BDLS18 (do not book your room via the WICC website!)
Accommodation is not included in the fee of the course, but there are several possibilities in Wageningen. For information on B&B's and hotels in Wageningen please visit proefwageningen.nl. Furthermore Airbnb offers several rooms in the area.
Registration & course fee
To register please complete the electronic reply form.
Applicants will be informed of acceptance of their registration as soon as possible. They will receive instructions for payment, and further course details.
The course fee includes printed materials, coffee/tea during breaks, lunches and one course dinner but does not cover accommodation. The course fee depends on the participant's affiliation:
|PhD candidates registered with one of the affiliated Graduate Schools
|| € 325
|| € 550
|University staff / Non -Profit
|| € 800
|Industry / For-Profit
|| € 2000
- No charge until 15 September 2018
- 25% of the course fee paid or due till 8 October 2018
- No refund after 8 October 2018
Substitutions for participants may be made at any time until the start of the course.