The Data Science for Econometrics specialisation focuses on the creative and technical side of data science. You will learn how to develop and apply advanced statistical, econometric, and machine learning techniques to support decision-making. The programme prepares you to turn complex data into actionable insights and to contribute to the development of new analytical methods.
Programme structure
The programme consists of seven core courses, a seminar, and a master’s thesis, spread across five blocks of eight weeks.
- Core courses introduce key methodologies from statistics, econometrics, machine learning, and computer science, each focusing on a specific set of techniques.
- Seminar is a team-based project in collaboration with companies or other organisations, where you solve a real-world problem from start to finish.
- Master thesis is written individually in the final blocks, based on your own research and under close supervision.
Curriculum overview
- 20% Statistics
- 30% Econometrics
- 20% Machine Learning and Computer Science
- 30% Seminar
The curriculum has a strong technical focus, with applications in business and broader data science contexts.
In class
You will work on real-world problems provided by participating companies or other organisations. For example:
How can we predict consumer behaviour or improve digital services?
Past seminar projects have included predicting TV viewing patterns, assessing vulnerability to contagious diseases, analysing chatbot conversations, detecting survey engagement, and modelling the impact of pricing on online shopping. You will develop models, implement them in software, and present practical recommendations to the organisation.
Study schedule
The Take-Off is the introduction programme for all new students at Erasmus School of Economics. During the Take-Off you will meet your fellow students, get acquainted with our study associations and learn all the ins and outs of your new study programme, supporting information systems and life on campus and in the city.
The use of statistical methods, as well as the methods themselves, has evolved greatly with the availability of an abundance of data. Somewhat paradoxically, issues with incomplete (missing) data have been increasing along with the general availability of data. The first part of the course therefore addresses learning (imputing) missing data from the observed parts. We will discuss approaches such as multiple imputation and bootstrap inference with missing data, as well as a predictive view on matrix completion. The second part will focus on statistical learning approaches based on regularized estimation. We will discuss relevant techniques in contexts such as regression, graphical models, and post-selection inference.
Bayesian Econometrics plays an important role in quantitative economics, marketing research and finance. This course discusses the basic tools which are needed to perform Bayesian analyses. It starts with a discussion on the difference between Bayesian and frequentist statistical approach. Next, Bayesian parameter inference, forecasting and Bayesian testing is considered, where we deal with univariate models, multivariate models and panel data models (Hierarchical Bayes techniques). To perform a Bayesian analysis, knowledge of advanced simulation methods is necessary. Part of the course is devoted to Markov Chain Monte Carlo sampling methods including Gibbs sampling, data augmentation and Monte Carlo integration. The topics are illustrated using simple computer examples which are demonstrated during the lectures.
This course discusses modern machine learning methods and their applications in econometrics. We begin with key foundations such as optimization and information theory, which underpin many learning algorithms. The course then explores neural networks and decision-tree-based methods for prediction and forecasting. In this context, we put an emphasis on quantifying uncertainty in economic data. Beyond the methods themselves, we discuss advanced hyperparameter tuning and approaches to interpretability and fairness. Throughout, the focus is on practical understanding through applications.
The course introduces robust statistical and machine learning methods as well as their use in data analysis and modeling. The models used in econometrics, statistics and machine learning are approximations to reality. Often these approximations are rather crude. Models can be misspecified, making analysis and conclusions doubtful. Moreover, modern datasets often contain outliers, which can lead to disastrous effects on conclusions and interpretations. Therefore, robust techniques are needed. We start the course with the introduction to the fundamentals of robust statistics such as influence function and breakdown point, which pave the way to the construction of robust techniques. We then continue with fundamental topics such as robust covariance matrix estimation, robust estimation of regression and generalized linear models, regression quantiles, and robustness aspects of statistical inference. Finally, we discuss modern topics in machine learning including median-of-means estimation and differential privacy.
We consider models and techniques that are grounded in probabilistic reasoning. Another way to phrase this is that this course deals with modeling and reasoning with unobserved, latent variables. Such latent variables appear in models for choice behavior and when heterogeneity needs to be modeled. We can also use this type of model in settings where complex patterns need to be modeled. We will, for example, use recent developments in language modeling to model choices among a very large set of alternatives and study how to use unobserved variables that can change over time. In general, probabilistic modeling is useful when the hidden structure of data needs to be revealed: variational autoencoders allow us to summarize high-dimensional data using factors following a pre-defined distribution, and Gaussian processes allow us to fit non-linear functions including proper uncertainty quantification. In the course we will study how and why these methods work and discuss how they can be applied in practice.
The course starts by introducing various text mining topics relevant for data science and econometrics. After that, in order to be able to query relational databases, the SQL query language will be studied. As means to process large amounts of data, we will look at parallel computing models, focusing on the Map-Reduce parallel computing style. Also, with the purpose of reducing the number of computations for finding similar items (a fundamental text mining problem), we will describe Locality-Sensitive Hashing and the related techniques of shingling and minhashing. Next, we will explore content-based and collaborative filtering recommender systems, with a special focus on latent-factor models.
When we use data as the foundation for making decisions, it is vital to ensure we fully understand what our estimates mean. This means being able to tell whether an estimated effect is because one variable causes the other to change, or if it's for other reasons. This course introduces the concepts of causality and identification, emphasizing challenges such as endogeneity and selection bias that are common in the estimators covered in previous courses. We then cover modern methods designed to address these issues. Topics include matching, difference-in-differences, instrumental variables, regression discontinuity designs, and causal machine learning.
The students are divided in small groups. Each group works on a research question. Usually this research question is put forward by a company. First, the relevant literature is studied. Next, the research question is translated in one or more models. To estimate the parameters of the models, (company) data is used. Much attention will be paid to the selection of the best possible model, given the research question. This model can be any model dealt with in the various courses, but it can also be a model that needs to be developed by the students themselves. The model parameters are estimated, and the model results are interpreted within the light of the research question. The final results will be presented in a scientific report and a presentation.
Proposal for the Master thesis Econometrics and Management Science. This proposal can be used as a part of the Master thesis. There is no grade for this proposal.
The thesis is an individual assignment about a subject from your Master's specialisation. More information about thesis subjects, thesis supervisors and the writing process can be found on the Master thesis website.
Disclaimer
This overview provides a general impression of the 2026-2027 curriculum. It is not the current study schedule. Enrolled students can find the most up-to-date version on MyEUR. Please note that minor changes may occur in future academic years.
