PhD defence L.H. (Henrik) John

Utility of Large Observational Datasets for Clinical Prediction

On Tuesday 9 June 2026 L.H. John will defend the doctoral thesis titled: Utility of Large Observational Datasets for Clinical Prediction

Promotor
Prof.dr.ir. P.R. Rijnbeek
Co-promotor
Dr. J.A. Kors
Date
Tuesday 9 Jun 2026, 10:30 - 12:00
Type
PhD defence
Space
Professor Andries Querido room
Building
Education Center
Location
Erasmus MC
Add to calendar

Below is a brief summary of the dissertation:

Large observational healthcare databases from electronic health records and insurance claims are often seen as a major opportunity for earlier disease prediction. Their value, however, does not lie in size alone.

Dementia prediction illustrates this clearly. Many published models could not be properly reused because essential details were missing from their reports. When several of these models were tested in international healthcare databases, their performance often dropped. A newly developed dementia model showed more stable results across multiple databases, underlining how important transparent reporting and external validation are if a model is meant to work beyond the setting in which it was created.

More data also did not automatically mean better prediction. For logistic regression, near-maximum performance was often reached without using all available data, showing that very large datasets can often be reduced with little loss of accuracy. More complex methods were not automatically better either. Across 11 healthcare databases and three prediction problems, conventional methods such as logistic regression and gradient boosting usually performed as well as or better than deep learning methods on structured healthcare data, especially when applied in different settings.

Adding clinical knowledge offered another route for improvement. Incorporating relationships from medical taxonomies into prediction models led to modest gains for lung cancer prediction.

Overall, reliable clinical prediction depends less on collecting ever more data and more on transparent model design, careful validation across healthcare settings, choosing methods that match the data, and making better use of existing clinical knowledge.

More information

The public defence will start exactly at 10.30 hrs. The doors will be closed once the public defence starts, latecomers can access the hall via the fourth floor. Given the solemn nature of the meeting, we advise not to bring children under the age of 6 to the first part of the ceremony.
 

A livestream link has been provided to candidate

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes