On Tuesday 9 June 2026 L.H. John will defend the doctoral thesis titled: Utility of Large Observational Datasets for Clinical Prediction
- Promotor
- Co-promotor
- Date
- Tuesday 9 Jun 2026, 10:30 - 12:00
- Type
- PhD defence
- Space
- Professor Andries Querido room
- Building
- Education Center
- Location
- Erasmus MC
Below is a brief summary of the dissertation:
Large observational healthcare databases from electronic health records and insurance claims are often seen as a major opportunity for earlier disease prediction. Their value, however, does not lie in size alone.
Dementia prediction illustrates this clearly. Many published models could not be properly reused because essential details were missing from their reports. When several of these models were tested in international healthcare databases, their performance often dropped. A newly developed dementia model showed more stable results across multiple databases, underlining how important transparent reporting and external validation are if a model is meant to work beyond the setting in which it was created.
More data also did not automatically mean better prediction. For logistic regression, near-maximum performance was often reached without using all available data, showing that very large datasets can often be reduced with little loss of accuracy. More complex methods were not automatically better either. Across 11 healthcare databases and three prediction problems, conventional methods such as logistic regression and gradient boosting usually performed as well as or better than deep learning methods on structured healthcare data, especially when applied in different settings.
Adding clinical knowledge offered another route for improvement. Incorporating relationships from medical taxonomies into prediction models led to modest gains for lung cancer prediction.
Overall, reliable clinical prediction depends less on collecting ever more data and more on transparent model design, careful validation across healthcare settings, choosing methods that match the data, and making better use of existing clinical knowledge.
- More information
The public defence will start exactly at 10.30 hrs. The doors will be closed once the public defence starts, latecomers can access the hall via the fourth floor. Given the solemn nature of the meeting, we advise not to bring children under the age of 6 to the first part of the ceremony.
A livestream link has been provided to candidate
