Robustness of multiple and multi-step tests

Abstract

We are living in the world, where a lot of data is available. The problems become big and high-dimensional. The classical statistical and econometric techniques were designed for relatively small datasets. These methods have serious issues in dealing with modern problems. One of them is the problem of outliers and more general data contaminations. The goal of this project is to study the robustness properties of tests and test-based procedures in modern data analysis.

In this project we would like to focus on statistical tests and their use in data analysis. Many contemporary problems involve several estimation steps, or various combinations of tests and estimators, where estimation depends on the previous test, or tests that are made one after another. It is well known that the datasets are seldom clean and usually contain outliers or misspecification. The robustness of one-step statistical tests has been studied (see Heritier and Ronchetti, 1994), but the multi-step tests and combinations of tests and estimators remain unexplored.

Keywords

Asymptotic theory, Big data, High-dimensional method, Influence function, Misspecification test, Multi-step test, Multiple test, Robust statistics,  Two-step estimator

Topic

The topic is very general and allows a lot of flexibility in terms of precise research questions. Fortunately, very little has been done so far in the literature in the context described above. As a starting point, it is useful to answer the following question: “How robust are the misspecification tests?” In classical statistical or econometric modelling a certain probabilistic model is assumed. However, the model is only an approximation of reality and the assumptions behind the model need to be verified. A popular strategy is to make misspecification tests (such as the Hausman-type test) in order to detect an unreliable model. However, the misspecification tests themselves rely on distributional assumptions and can be affected by data contamination. The contamination can be represented by outliers or situations when unknown part of data is generated by other unknown data generating process. The performance of misspecification tests under contamination needs to be investigated. Can they be fooled by outliers? If so, what happens with the model estimation and interpretation? A natural follow-up step, is to develop the robust versions of these tests.

The second part of the project is to derive the influence functions for tests involving multiple steps, i.e. when one test is made after another (an example of this is forward model selection). The influence function is a convenient tool, which allows to infer, whether the estimator/test is robust or not and gives a general way to construct robust counterpart of the estimator/test. The robustness of two-step estimators has been studied by Zhelonkin, Genton & Ronchetti (2012) with an application to sample selection model (Zhelonkin et al., 2016). Robustness of one step tests was explored by Ronchetti (1982) and Heritier and Ronchetti (1994), however robustness of multi-step tests has not been studied, as well as interplay between estimators and tests under contamination.

Approach

We will work using tools from Robust Statistics using the approach based on Influence Functions (the textbooks include Hampel et al, 1986, Maronna et al 2006).

Literature references

  • G’Sell, M.G., Wager S., Chouldechova, A. and Tibshirani, R. (2016) Sequential Selection Procedures and False Discovery Rate Control. Journal of the Royal Statistical Society, Series B. vol 78, p.423-444.
  • Hampel, F., Ronchetti, E., Rousseeuw, P.J. and Stahel, W. (1986) Robust Statistics: the Approach Based on Influence Functions. New York: Wiley.
  • Hausman, J.A. (1978) Specification Tests in Econometrics. Econometrica, vol 46, p. 1251-1271.
  • Heritier, S. and Ronchetti, E. (1994) Robust Bounded Influence Tests in General Parametric Models. Journal of the American Statistical Association, vol 89, p.897-904.
  • Maronna, R.A., Martin, D.R., Yohai, V.J., (2006) Robust Statistics: Theory and Methods. Chichester: Wiley.
  • Ronchetti, E. (1982) Robust Testing in Linear Models: The Infinitesimal Approach, PhD Thesis, ETH, Zurich.
  • Zhelonkin, M., Genton M.G., Ronchetti, E. (2012) On the Robustness of Two-Stage Estimators. Statistics and Probability Letters, vol 82, p. 726-732.
  • Zhelonkin, M., Genton M.G., Ronchetti, E. (2016) Robust Inference in Sample Selection Models. Journal of the Royal Statistical Society, Series B. vol 78, p. 805-827.

Cooperation

University of Geneva, Switzerland

Expected output

One article in general statistics/econometrics journal, and two articles in more specialized field journals.

Scientific relevance

The proposed research targets methodological extensions of existing literature on robust testing. The general results should pave the way to construct robust data analytic procedures.

Societal relevance

The proposed research is a methodological project. The developed techniques are general and should enhance the toolbox of economists, social scientists, political scientists, biostatisticians and other specialists who work with complex datasets and need to make reliable inference and conclusions.

PhD candidate profile

Very strong mathematical background. An ideal candidate is either mathematician/physicist with understanding of Econometrics or econometrician with solid mathematical skills. Programming skills are also necessary, most programming will be in R, but some parts of code in C++ can become necessary.

Supervisor(s)

Dr. Chen Zhou
T: +31 (0)10 4081342
E: zhou@ese.eur.nl

Dr. Mikhail Zhelonkin

T: +31 (0)10 4082588
E: zhelonkin@ese.eur.nl

Graduate school

This project is affiliated with the Tinbergen Institute graduate school, applicants for this project need to pass the Tinbergen Institute's admission requirements before they can be considered for a PhD position at ESE.

Note that the Tinbergen Institute requires valid GRE General Test results from all applicants. More information about the GRE test is available here. Be aware that available seats for this test fill up very fast so book your test well in advance. Please contact the GRE program for specific questions about the GRE test.

Deadline

Application deadline: 15 January 2019

Interested?

Apply for this project using our online application form. Please use the project code below to apply for this project.

Tinbergen project code:

TI PhD 2019 ESE CZ MZ