SOCIAL WEB MINING

Methods for opening up and interpreting solidarity discourses on the social web

In order to enable the understanding of dynamic solidarity discourses on the web and their interrelationships with social events, this work package will develop methods of machine learning and information extraction for the analysis of online discourse, especially on Twitter. Building on existing methods (Fafalios et al., 2018) and corpora, such as TweetsKB (https://data.gesis.org/tweetskb/), methods for Natural Language Processing (NLP) and information extraction are applied and adapted to tailor-made methods for understanding key concepts of concern ("trust", "solidarity") and the related discourse. This not only requires the automatic interpretation of unstructured content, e.g. with the help of Named Entity Disambiguation (NED) or sentiment analysis, but also the derivation of demographic characteristics of the users and the guarantee of the representativeness of the extracted samples and information.

The challenges in this context are the scope and heterogeneity of the data: TweetsKB is currently based on around 11 billion tweets, with Twitter discourse being characterized by informal language, the interpretation of which often requires the context (time, place, linked content) to be taken into account power. The project will build on existing work and adapt it to the specific problems and research challenges of this project, e.g. to georeference tweets by German users, to understand solidarity-related discourses and terms, taking into account the specific language and vocabulary used on social online platforms, and to enable the temporary analysis of discourses and opinions over time.

Fafalios P., Iosifidis V., Ntoutsi E., Dietze S. (2018) 'Tweetskb: A public and large-scale rdf corpus of annotated tweets', European Semantic Web Conference, 177–190.

GESIS – Leibniz-Institute for the Social Sciences / Prof. Dr. Stefan Dietze