Semantic Scholarly Publishing

Project leaders: Gert Goris, Uzay Kaymak and Paul Wouters
PhD student: Alexander Hogenboom
Duration: 2009 - 2012

Introduction
The recent turmoil in the financial markets has demonstrated the growing need for automated information monitoring tools that can help to identify the issues and the patterns that matter and that track and predict emerging events. This need is addressed by the concept of trend mining, which in our definition includes the mining of general mood, public sentiment, and individual opinions, in addition to the tracking of more usual numerical quantities such as the consumer confidence. In this context, a trend has a broad meaning and also includes a set of established trendsetters, identified trend followers, detected topics, and associated terminology (e.g., buzz words).

The traditional approach to tracking trends consists of tracing some signals that can be expressed numerically, which act as proxies for the actual quantity of interest. Often, the tracking signal is a quantity that can be measured. Alternatively, surveys are sometimes used to obtain the information directly from humans. With the advent of the Internet, many aspects of human activity can be traced digitally. By analyzing the available free-text information and combining the results of the analysis with available numerical information, the tracking of processes may be improved, in which moods, sentiments, and opinions of the public can also be accounted for in addition to the numerical trends. Text data for this purpose is available in online publications, news items in online newspapers, individual blogs, RSS feeds, and forums.

In recent years, a lot of (research) effort is being spent to develop monitoring tools that combine textual and numerical data. For example, Google has recently introduced a barometer for tracking moods and conditions of the economic system in the Netherlands, based on the trends in search keywords (see Google Barometer). The company Teezir has developed a sentiment evaluator (see whorules.nl) based on opinion mining technology. The user can enter a keyword (e.g., a name of a person or product) and the website returns the sentiment or popularity associated with the keyword. Further, tools have also become available for the mining of dynamic streams of textual information that are able to identify trends, moods, and opinions.

Existing toolkits, however, are limited to simple word counts and relevant linguistic resources are absent or do not always fit into the applied framework. Today’s text analytical tools are ill-equipped to deal with highly dynamic domains, because they have been developed without adaptation in mind and largely ignore structural aspects of content. Such aspects of content however can be of paramount importance in discovering trends. For instance, by using argumentation structure and elements such as specific metaphors, analogies, vocabularies, or supportive non-textual data, a specific mood or public opinion can be expressed and promoted. The use of analogies or vocabularies invoking negative associations in media reports on the current economic situation may lead people to have negative expectations. Furthermore, we hypothesize that not all parts of a text contribute equally to expressing or revealing the underlying trend, sentiment, or mood, but that this depends on their position within the overall structure of the text and of the argumentation. Hence, detection of economic trends in text documents can be done in a more effective way when using adequate filters that can help to select parts that, given their role in the argumentative layers of a text, are more salient to the targeted task and should be assigned more weight.

Economic trend mining can hence be improved if the information in the structural elements of a text can be harvested. In this project, we therefore aim to advance economic trend mining by the development and application of models, methods, and algorithms for (semi-)automatic argumentation discovery in economics discourse that facilitate the incorporation of argumentation and argumentation structure in mining methods. The information monitoring methods we aim to develop incorporate structural and semantic aspects of the content to enhance current tracking tools.