Practical AI for Research: Large Language Models and Unstructured Data

ERIM Summer School 2026 Course
  • Dates: 13, 15, 17, 20 July 2026
    Time: 11:00-12:45 & 15:00-16:45
    Format: Online
    ECTS: 2
    Instructor: Dr. Vahid Moghani
    Fee: €500 (free of charge for ERIM PhD candidates)

    Two people talking on a sunny city street

Abstract

This course introduces researchers to practical and responsible uses of artificial intelligence, with a focus on large language models (LLMs). Participants learn, at an accessible level, how LLMs work, what they can and cannot do, and how to use them reproducibly in research. The course then covers hands-on applications across the research workflow: developing and refining ideas, improving academic writing, supporting coding and debugging, assisting with literature review, and structuring early analyses. Finally, the course introduces LLM-based and open-source approaches for unstructured data, including text (e.g., coding, classification, labeling) and selected examples for image data. By the end, participants can integrate AI into their research workflow transparently, with documented prompts, careful validation, and clear reporting.

This course is designed for doctoral candidates and researchers who want to use AI tools effectively while maintaining scientific standards. It is delivered online through 8 interactive 1.5-hour sessions, each combining a short mini-lecture, a live demonstration, and guided hands-on practice.

The course combines short conceptual explanations with guided hands-on exercises. We start with a high-level explanation of modern LLMs, including typical failure modes (hallucinations, bias, instability), and the implications for research quality, ethics, privacy, and reproducibility.

We then practice AI support across key research tasks: (1) prompting strategies for generating and stress-testing ideas; (2) using AI as a writing assistant for structure, clarity, and argumentation (without outsourcing scholarship); (3) coding support (drafting, refactoring, debugging, documentation); (4) literature review assistance (search planning, screening support, synthesis outlines), with emphasis on verification and traceability.

We then practice AI support across key research tasks: (1) prompting strategies for generating and stress-testing ideas; (2) using AI as a writing assistant for structure, clarity, and argumentation (without outsourcing scholarship); (3) coding support (drafting, refactoring, debugging, documentation); (4) literature review assistance (search planning, screening support, synthesis outlines), with emphasis on verification and traceability.

In the final part, we work with unstructured data. Participants run examples for text classification/labeling and exploratory analysis using LLMs and open-source models. Depending on the cohort interest, we include a short module on image tasks (e.g., simple classification/labeling concepts), framed as an extension.

Teaching methods include mini-lectures, live demonstrations, individual hands-on tasks, peer discussion, and take-home assignments. Participants are expected to actively experiment, keep a structured “AI use log,” and reflect on responsible reporting practices.

Participants will be able to:

  • Explain in broad terms how LLMs are trained and why they can fail in predictable ways.
  • Use prompting strategies to obtain useful outputs and to test robustness.
  • Use AI tools to improve academic writing while preserving originality, citation integrity, and author responsibility.
  • Use AI-assisted coding workflows, including debugging, refactoring, and documentation, with verification practices.
  • Apply LLMs and open-source models to unstructured text data for tasks such as labeling, classification, and exploratory analysis.

Integrate AI into research workflows transparently and reproducibly.

  • Musslick, Sebastian, Laura K. Bartlett, Suyog H. Chandramouli, Marina Dubova, Fernand Gobet, Thomas L. Griffiths, Jessica Hullman et al. "Automating the practice of science: Opportunities, challenges, and implications." Proceedings of the National Academy of Sciences 122, no. 5 (2025): e2401238121.
  • Gilardi, Fabrizio, Meysam Alizadeh, and Maël Kubli. "ChatGPT outperforms crowd workers for text-annotation tasks." Proceedings of the National Academy of Sciences 120, no. 30 (2023): e2305016120.
  • Hansen, Stephen, Peter John Lambert, Nicholas Bloom, Steven J. Davis, Raffaella Sadun, and Bledi Taska. Remote work across jobs, companies, and space. No. w31007. National Bureau of Economic Research, 2023.
  • Boiko, Daniil A., Robert MacKnight, Ben Kline, and Gabe Gomes. "Autonomous chemical research with large language models." Nature 624, no. 7992 (2023): 570-578.

Recommended:

  • Brynjolfsson, Erik, Danielle Li, and Lindsey Raymond. "Generative AI at work." The Quarterly Journal of Economics 140, no. 2 (2025): 889-942.
  • Doshi, Anil R., and Oliver P. Hauser. "Generative AI enhances individual creativity but reduces the collective diversity of novel content." Science advances 10, no. 28 (2024): eadn5290.
  • Chen, Zenan, and Jason Chan. "Large language model in creative work: The role of collaboration modality and user expertise." Management Science 70, no. 12 (2024): 9101-9117.
  • Bail, Christopher A. "Can Generative AI improve social science?." Proceedings of the National Academy of Sciences 121, no. 21 (2024): e2314021121.

Assessment
Participation (in-session exercises and peer discussion): 20% 
Assignment 1 (15%): Prompting  
Assignment 2 (15%): AI-assisted academic writing
Assignment 3 (20%): AI-assisted coding workflow 
Assignment 4 (30%): Unstructured text mini-application 

Workload
Online sessions: 12 hours : 8 × 1.5-hour interactive sessions, combining mini-lecture, demonstration, and hands-on practice
Preparation and self-study between sessions: 8 hours
Required literature / pre-reading: 6 hours
Assignments (4 total): 30 hours

Attendance
The course certificate will be issued only to participants who have fulfilled all course requirements, which include:

  1. Successful completion of the course assessments in accordance with the assessment criteria.
  2. Required attendance at the live online sessions.

The course focuses on practical use and validation; it does not provide training in building LLMs from scratch or advanced deep learning.

Participants should be comfortable with basic coding tasks, such as (a) running an R script or Python notebook, (b) installing and loading packages, (c) editing simple code (variables, functions, loops), and (d) reading error messages and using them to debug with guidance.

Participants need a laptop with R or Python installed and the ability to run notebooks (RStudio / Quarto, Jupyter, Google Colab).

Contact

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes