Marketing modelling for large-scale assortments


Consumers who shop (online) are often overloaded with information, choice options, web-sites, links to other web-pages, etc. Managers of (online) stores need to make many decisions: which products to sell, what prices to charge, which promotions to organize, which customers to target, etc. In principle a lot of data is available to optimize such decisions. However, both the dimension and level of detail of the data make it challenging to actually use the data. In this project you will work on the development and application of econometric marketing models that (i) can help guide practical decision making; (ii) are scalable (that is, can be estimated in a reasonable time frame); (iii) combine different data sources; and (iv) work at the individual product and/or customer level, such that customization of prices and promotions is feasible.

In this project we will seek active collaboration with Dutch or international (online) retailers. These contacts will help us target practically useful research questions and provide access to detailed data. In the ideal case we will be able to test the developed methodology in real life.


Marketing, econometrics, machine learning, prediction, recommendation, online retailing, dynamic pricing


Research questions

Various concrete research questions fall in this domain. We discuss some examples below. Note that this is merely a first inventory of the possibilities.

1. Dynamic pricing

The optimal pricing of products is a key challenge for all retailers. In an online setting the pricing problem is especially complicated as customers can easily check the prices of competitors. To be able to optimally set the price for many products in a category one does not only need to know the own and cross price effects, but also the relevance of competitor prices. To complicate things even further, the store may take own (or competitive) inventory into account and adapt prices accordingly. Finally, prices that are chosen today will have an impact on demand in the future. This means that price setting is not a static but a dynamic problem. Effective models that can be used to capture the relevant dynamics for a large number of products simultaneously are not available yet.

2. Keyword advertising

A major source of website traffic to stores is (sponsored) keyword advertising at search engines such as Google and Bing. Retailers can bid on keywords such that their site gets ranked high among the sponsored search results. However it is not directly clear what the added value of such a bid is to the retailer, nor is it clear what the optimal bid should be. An interesting challenge is to model the impact of keyword advertising as part of the consumers’ path to purchase (the conversion funnel).

3. Targeted promotions

Online retailers can promote specific products to a selected group of customers through, for example, email marketing. How can we figure out which products to promote to which customers? It does not make sense to promote a product in which the customer is not interested. On the other hand it is not smart to promote products that are already well known to the customer (they would also buy them without the promotion). How to strike an optimal balance here? How can we model this problem for many customers and many products simultaneously? And possibly even more complicated, how can we account for situational influences in determining the optimal promotion strategy?

Research field

The research falls in two of the main research domains of Erasmus School of Economics: Marketing and Econometrics. It corresponds to a key research area as defined by the marketing group (quantitative analysis of customer behaviour) and to a key area as defined by the Econometric Institute (discrete choice analysis).


To be able to deal with complex heterogeneous data Bayesian modelling is the ideal tool. However, the typical estimation methodology for such models requires time-consuming simulations. In a large-scale setting (many products and/or many consumers) such simulation-based methodology is not feasible. Alternative methods are available that allow researchers to remain in the Bayesian paradigm, but remove the need for simulation. One of such methods is called Variational inference [VI]. VI is an approach to obtain approximate inference in Bayesian models using optimization techniques.

A major component of this project will be the development of scalable estimation techniques. VI is one of the promising candidates.

Literature references

A good example of the type of research in this project is:

This paper is the one of the results of the PhD thesis of Bruno Jacobs (to be defended in December 2017).

Data will be obtained through cooperation with companies. Some data is also more publicly available for example the Instacart data or data through the Wharton Customer Analytics Initiative.


No specific research groups are identified for collaboration. However, it is the aim to have the PhD candidate visit a top US (or European) university for a couple of months during the second or third year of the project.

Expected output

Within this project at least three high quality papers are expected. Each of the papers will be targeted to one of the top journals in (quantitative) marketing, that is, Journal of Marketing Research or Marketing Science or a top journal in statistics or econometrics, for example, Journal of Business and Economic Statistics or Journal of Econometrics.

Scientific relevance

In the academic marketing literature there is a strong interest in the development and application of quantitative methods. This is especially true if the methods help solve actual decision problems that marketing managers confront.

The added value of this project will be in the following aspects:

  • Development of new econometric methodology to deal with large-scale data
  • Providing guidelines for various product-level decisions faced by (online) retailers: pricing, promotion, assortment decisions, etc.
  • Actual implementation and testing of methodology in real life

Societal relevance

The societal relevance for this project is mainly reflected in the collaboration with industry. The supervisory team already has various contacts with (online) retailers in the Netherlands and has worked with them in the past on academic research projects. These (and other) companies will clearly benefit from this research. For them the benefit is in being able to optimize their everyday decision making using model-based tools.

PhD candidate profile

This project requires a candidate with a strong background in econometrics and/or statistics. Furthermore some affinity with working with large-scale data is useful. Ideally, the candidate already has experience with Bayesian statistics. Next to these technical skills a keen interest for marketing is needed.


Prof. Dr. Dennis Fok

T: +31 10 4081333

Prof. Dr. Bas Donkers

T: +31 10 4082411

Graduate school

Depending on the candidate's interest the project can be affiliated with either ERIM (for a Research in Management approach) or the Tinbergen Institute (for an Economics and Econometrics approach).

Applicants for this project need to pass ERIM's or the Tinbergen Institute's admission requirements (depending on the approach) before they can be considered for a PhD position at Erasmus School of Economics.

If you are unsure of the graduate school with which you want to be affiliated please contact the projects supervisor.


Application deadline: 1 April 2018


Apply for this project using the Tinbergen Institute online application form. Please use the project code below to apply for this project.

Tinbergen project code:

TI PhD 2018 DF BD