Week 4

Literature (reading required)

Saif Mohammad (2011), From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH-2011), ACL, pp 105-115. http://dl.acm.org/ft_gateway.cfm?id=2107650&ftid=1099749&dwn=1&CFID=239446597&CFTOKEN=64021484 (approx. 10 pages)

Michel, J.-B. and Shen, Y. K. (2010). Quantitative analysis of culture using millions of digitized books. In: Science Magazine, 331(6014): 176–1 82.(6 pages)

Background reading/viewing

Read-me texts related to N-gram viewer: http://books.google.com/ngrams

TED talk by Jean-Baptiste Michel and Erez Lieberman AIden, What we learned from 5 million books (14 minutes)

Jensen, L. J., Saric, J., & Bork, P. (2006). Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics, 7(2), 119–129. doi:10.1038/nrg1768  or link

Moniz, A and de Jong, F.M.G.(2014) Sentiment analysis and the impact of employee satisfaction on firm earnings. In: 36th European Conference on IR Research, ECIR 2014, Amsterdam, the Netherlands. pp. 519-527. Lecture Notes in Computer Science 8416. Springer Verlag.

Session description

Session 4 -  Friday 17 November 2014 (T18-33)


Sentiment analysis, text mining and sorting out data

This session will be focused on text mining and some tools to analyze and visualize large quantities of text. We will compare methods and types of outcomes, also by taking a look at discourse analysis (as an opposite method).  In the exercises, students will be introduced into regular expressions (regex) as a way to parse databases with large amounts textual material.

  • We will look into concepts and processes of sentiment analysis and the underlying assumptions built in sentiment and text analysis tools. Furthermore, we will project annotation and sentiment analysis on a (set of) text(s) selected by the participants
  • TAMS analyser will be used (or equivalent CAQDAS) to show how different media formats can be annotated and how different types of analysis can be performed.

This final session will also be used for presentations of the results of the projects carried out in the previous weeks.