Introduction

This course introduces a set of digital research methods (DRM). With these innovative methods, it is possible to analyse large textual datasets from social media, news articles, interviews, and other sources, and also render these as networks, an alternative analytical perspective. In virtually all disciplines in the social sciences and humanities, these techniques are becoming increasingly popular.

The course is speciﬁcally designed for people who do not feel comfortable using technical programming software. We will focus on how DRM can be applied with accessible software based on user-friendly interfaces. However, those who more inclined to learn or use programming are welcome to do so, as the course material also includes instructions for executing DRM using R (a statistical programming language).

Course information

ECTS: 2.5
Number of sessions: 4
Hours of session: 3

Practical information

Duration: 12 hours
Teaching mode: In-person

After completion of this workshop, you will be able to scrape and clean textual data from social media and news articles.
You will be able to conduct some digital research methods, particularly text analysis, topic modelling, sentiment analysis and network analysis.
You will be able to visualise and interpret results of the analysis.

Go to course guide

Back to course guide

Aim and working method

The ﬁrst class will introduce concepts and structuring of digital data. We will also cover some basic approaches to scraping social media content (namely Twitter) as well as news articles (LexisNexis) and will also cover steps for cleaning textual data and basic text analysis.

In the second class, more advanced text analysis approaches will be introduced. This will include topic modelling - a powerful but easy to use text analytic method for uncovering hidden themes from many text documents - and sentiment analysis, a method for assessing polarity in texts.

In the third class, we will explore additional social media scraping tools (for Facebook, YouTube, and Instagram) and also introduce network analysis, a relational perspective that can also be applied to text data. We will examine topic models rendered as networks. Network depictions of textual content can reveal new perspectives and lead to enhanced interpretations.

The fourth class continues the exploration of text-as-networks, including entity and semantic networks,

Also, some steps for using network analysis approaches to visualise and analyse qualitative content coding will be undertaken.

› There will be four 3-hour sessions. Each session will include a mix of lectures (40%), demonstrations (5%), and in-class exercises (55%).

› Participants can work with the text and network data supplied for the course OR they can explore text/network data of their own.

How to prepare

In order to actively participate in the course, you are required to read the following literature:

› Levallois, C. (2017). A primer on text mining for business. (https://seinecle.github.io/mk99/generated-pdf/text-mining-for-business.pdf)

› Levallois, C. (2017). A primer on network analysis for business.

(https://seinecle.github.io/mk99/generated-pdf/network-analysis-for-business.pdf)

› Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. (Focus on sections leading up to ‘LDA and probabilistic models’

(https://cacm.acm.org/magazines/2012/4/147361-probabilistic-topic-models/fulltext)

› Thelwall, M. (2017). Heart and soul: Sentiment strength detection in the social web with SentiStrength.

Cyberemotions: Collective Emotion in Cyberspace, 119134.

(Paper available on SentiStrength website; focus on sections Introduction, Using, Core, Additional, Sarcasm, Application; you may skim the rest)

› Lee, J. (2021). Digital methods and tools: A Step-by-Step Guide, Erasmus University Rotterdam (URL will be emailed to participants)

The ﬁrst two readings are very short introductions and applicable to domains beyond business.

You should also familiarise yourself with the instructor’s Digital Research Methods Step-by-Step Guide, particularly the sections on topic modelling (4.8) and topic networks (6.9) and data scraping: Mozdeh (3.8), LexisNexis (4.1), SNScrape (3.9), and Netvizz (for YouTube 2.4):

If the course is not held in a pc lab, then bring your own laptops for the in-class exercises. Do note, you may need to have Administrator rights on your laptop in order to install some of the software. The following software programs need to be installed:

› ConText 1.2 or 2.0: http://context.lis.illinois.edu (http://context.lis.illinois.edu/) (http://context.lis.illinois.edu/)

› Gephi 0.9.2: https://gephi.org (https://gephi.org/) (https://gephi.org/)

› Mozdeh (Big Data Text Analysis, Windows only):(http://mozdeh.wlv.ac.uk/) (http://mozdeh.wlv.ac.uk/)

› SNScrape (for Twitter scraping. Available only through the DRM Dropbox ‘tools/Extra’ folder: (URL to

be emailed to participants)

These tools may be acquired from either the course instructor’s Digital Research Methods Dropbox ‘tools’ folder (see below) or the original websites.

› DRM Dropbox ‘tools’ folder: (URL to be emailed to participants)

Session description

This session introduces you to world of digital data, including text data.
Also, you will learn to scrape data from Twitter and LexisNexis using several online and oﬄine tools, extract their textual elements, and learn how to conduct basic, but necessary, cleaning of the data in the ConText text analysis software.
Finally, you will learn to conduct basic text analysis.

In this session, you will learn about how topic models operate, their application, and subsequently perform and interpret topic modelling on the acquired data.
We will cover other approaches to social media scraping (for Facebook, YouTube, and Instagram) and more rigorous text cleaning through Excel.
You will also learn about automated sentiment analysis, which can detect polarity of text segments.

In this session, you will learn about (social) network analysis, an analytical relational perspective of data analysis.
You will learn how textual data can be viewed as networks, specifically topic model networks, through the Gephi program.

This session extends the network treatment of textual data and covers various semantic networks.
Also, the network approach to qualitative coding/analysis will also be investigated.

Start date

Dates and locations for 2025-2026 are still to be determined.

Instructor

Ju-Sung (Jay) Lee is assistant professor of digital research methods at the Department of Media and Communication of Erasmus University Rotterdam (EUR). His research focuses on various digital, network, and statistical methodologies and their application to online and oﬄine discourse and interactions, recently in the context of the refugee crisis and artist communities. Jay holds a PhD in sociology from Carnegie Mellon University (USA) and has a background in computer science, organisation and decision sciences, and quantitative sociology.
Email address
lee@eshcc.eur.nl

Contact

Enrolment-related questions: enrolment@egsh.eur.nl
Course-related questions: gruber@ese.eur.nl
Telephone: +31 (0)10 4082607

Facts & Figures

Duration

12 hours

Price

free for PhD candidates of the Graduate School
€ 575 for non-members
Consult our enrolment policy for more information

Tax

Not applicable

Course code

Instruction language

English

Teaching mode

In-person

Digital research methods for textual data

Introduction

Practical information

What will you achieve?

Go to course guide

Aim and working method

How to prepare

Session description

Start date

Instructor

Contact

Facts & Figures

Compare @count study programme

@title

Digital research methods for textual data

Introduction

Practical information

What will you achieve?

Go to course guide

Aim and working method

How to prepare

Session description

Session 1: Digital data, basic scraping, cleaning of data and text analysis

Session 2: Advanced text analysis: Topic modeling and sentiment analysis

Session 3: Network analysis

Session 4: Advanced text-as-networks

Start date

Instructor

Contact

Facts & Figures

Compare @count study programme

@title