Deep Generative Modeling for Tabular Data

A person jumping off from gray concrete building

PhD-candidate: Markus Mueller
Start: Fall 2022

Recent advances in deep generative modeling have produced impressive results for images, videos, text, and proteins. Yet, tabular data, despite being the most prominent data type in industry and the social sciences, is lagging behind. Closing this gap matters: realistic synthetic samples without disclosing sensitive information can democratize access to otherwise proprietary datasets, improve missing value imputation, and enable privacy-preserving model development in high-stakes environments where sharing real data is impossible.

Man in suit standing at center of radiating blue arrows pointing different ways

Tabular data presents unique challenges. It organizes heterogeneous features, e.g., continuous, binary, ordinal, high-cardinality, and mixed-type, within a single matrix, often alongside missing values. State-of-the-art generative models, typically designed for homogeneous continuous data, are poorly equipped to handle this complexity. A further challenge arises when data are observed repeatedly over time, as in longitudinal surveys: realistic generation must capture not only cross-sectional correlations but also temporal dynamics.

My research develops diffusion and flow matching models tailored to these characteristics. By explicitly accounting for feature-type heterogeneity, our models better align type-specific generative processes. Drawing an analogy to image resolution, we further distinguish between low- and high-resolution features: the model first generates easy, coarse structure and then fills in fine-grained details, substantially improving sample realism. This approach also enables native support for mixed-type features and missing values, and sets a new state of the art in tabular data generation, with direct implications for privacy-preserving data sharing in healthcare, official statistics, and the social sciences.

Selected projects from the Econometric Institute

View more

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes