Abstract
This hands-on course equips students to turn messy online information into decision-ready insights using R and RStudio. Students learn a reproducible analysis workflow—importing and cleaning data, writing functions, efficient iteration, and joining different sources—to build tidy, analysis-ready datasets. Practical web data scraping is covered with tools for static and dynamic pages, plus link design, regex extraction, error handling, and responsible scraping principles.
With structured data in place, the course develops core modeling and visualization skills for managerial contexts: linear and limited dependent variable models, fixed effects, machine learning and clear communication through publication-quality graphics in ggplot2.
The course is delivered fully online through a mix of lectures, live demonstrations, and guided coding exercises. Students learn primarily by doing: completing hands-on tasks, working with realistic web-based datasets, and building an end-to-end analysis project. Learning is reinforced through structured discussion of modeling choices and interpretation, peer feedback on code and results, and independent study supported by readings and annotated example scripts. Throughout, emphasis is placed on transparent, well-documented workflows and responsible web data acquisition.
Acquire data from the web responsibly, including from static and dynamic sources, using appropriate R tools and best practices (e.g., robust extraction, error handling, and respectful scraping).
Engineer robust, documented, and reproducible workflows in R, transforming messy web data into tidy, analysis-ready datasets through cleaning, joining, functional programming, and efficient computing.
Choose and estimate appropriate econometric models for decision-relevant questions (including linear and limited dependent variable models, fixed effects, and machine learning methods), and interpret results in a managerial context.
Communicate results effectively, producing publication-quality visualizations (ggplot2) and presenting evidence-based conclusions and recommendations that support managerial decisions.
Supporting Literature:
Wickham & Grolemund — R for Data Science (2e)
Can be found at: https://r4ds.hadley.nz/
Notes/slides will be provided.
Assessment
Assignments: 100%
Workload
Online sessions: 10 hours
Assignments and self-study: 36 hours
Attendance
Attendance at all course sessions is mandatory. The course certificate will be issued only to participants who have fulfilled all course requirements, which include:
- Required attendance at the course sessions.
- Successful completion of the course assessments in accordance with the assessment criteria.
Contact
- Content related questions
Dr. Sam Hoey
Email address - Enrolment related questions
ERIM Doctoral Office
Email address
