Web Data to Decision Making with R Studio

ERIM Summer School 2026 Course
  • Dates: 23, 24, 30 June-1 July 2026
    Time: 11:00-12:45 & 15:00-16:45
    Format: Online
    ECTS: 2
    Instructor: Dr. Sam Hoey
    Fee: €500 (free of charge for ERIM PhD candidates)

    student

Abstract

This hands-on course equips students to turn messy online information into decision-ready insights using R and RStudio. Students learn a reproducible analysis workflow—importing and cleaning data, writing functions, efficient iteration, and joining different sources—to build tidy, analysis-ready datasets. Practical web data scraping is covered with tools for static and dynamic pages, plus link design, regex extraction, error handling, and responsible scraping principles.

With structured data in place, the course develops core modeling and visualization skills for managerial contexts: linear and limited dependent variable models, fixed effects, machine learning and clear communication through publication-quality graphics in ggplot2. 

The course is delivered fully online through a mix of lectures, live demonstrations, and guided coding exercises. Students learn primarily by doing: completing hands-on tasks, working with realistic web-based datasets, and building an end-to-end analysis project. Learning is reinforced through structured discussion of modeling choices and interpretation, peer feedback on code and results, and independent study supported by readings and annotated example scripts. Throughout, emphasis is placed on transparent, well-documented workflows and responsible web data acquisition.

Acquire data from the web responsibly, including from static and dynamic sources, using appropriate R tools and best practices (e.g., robust extraction, error handling, and respectful scraping).

Engineer robust, documented, and reproducible workflows in R, transforming messy web data into tidy, analysis-ready datasets through cleaning, joining, functional programming, and efficient computing.

Choose and estimate appropriate econometric models for decision-relevant questions (including linear and limited dependent variable models, fixed effects, and machine learning methods), and interpret results in a managerial context.

Communicate results effectively, producing publication-quality visualizations (ggplot2) and presenting evidence-based conclusions and recommendations that support managerial decisions.

Supporting Literature:
Wickham & Grolemund — R for Data Science (2e)
Can be found at: https://r4ds.hadley.nz/
Notes/slides will be provided.

Assessment 
Assignments: 100%

Workload
Online sessions: 10 hours
Assignments and self-study: 36 hours

Attendance 
Attendance at all course sessions is mandatory. The course certificate will be issued only to participants who have fulfilled all course requirements, which include:

  1. Required attendance at the course sessions.
  2. Successful completion of the course assessments in accordance with the assessment criteria.

Contact

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes