What

Tweetplomacy 23 is a semantically annotated corpus of tweets capturing digital communicative interaction between international political leaders, peer groups and citizens in the wake of three major global crises: (1) the increasing emphasis on the security of energy supplies following Russia’s invasion of Ukraine; (2) the political and geo-economic consequences of the COVID-19 pandemic; (3) the intensified debate on the progression of climate change. These events occurred between 2018 and 2023, each of them marking a significant shake-up of the international system.

The dataset focuses on the strategic use of networked information on X (formerly Twitter) by executive political actors facing exogenous shocks in the context of a global crisis situation. It is extracted from an X archive covering more than 14 billion tweets collected from the 1% random sample API. To extract the dataset, we resort to a list of top executives of the political administration – heads of state, heads of government, ministers of foreign affairs – or their respective public-relations offices. Their tweets are filtered using a list of thematically relevant keywords in four languages (English, German, French, Spanish), reflecting the discourse with respect to the three crises mentioned above.

Our sample covers instances from the beginning of 2018 up to May 2023, representing statements made by leading politicians from 83 countries on all continents. As a subset, tweets published by the political leaders of the 38 member states of the OECD and the five BRICS countries (Brazil, Russia, India, China, South Africa) have been extracted. Additionally, the sample comprises a selection of 10 international organizations.

Why

  • track and examine the repercussions/resonance produced by the ‘digital audience’ of the most influential political leaders in the course of the three crises, thus hinting at the political and societal impact their communicative actions had in the digital realm.
  • identify changes in sentiments, argumentation and/or tonality as well as more general breakpoints of discussion by conducting in-depth analyses of the online discourse relating to each of the three debates.
  • yield new insights into networks of communication among ‘online champions’ in the diplomatic community with regard to global political crises. To this end, researchers will be able to employ both quantitative/statistical and qualitative/hermeneutic methodologies to further explore and compare specific communicative motivations of national political leaders and the global ‘digital public’ in such cases.
  • be used as a valuable empirical input not merely for political or media scientists, but also for scholars focusing on sociological, economic or socio-psychological aspects of crisis communication.

Dataset

The entire data collection consists of the following files: (1) users: excel file with a list of 654 Twitter user handles (usernames) of top executives of the political administration (and/or their institutional accounts), their nationalities, functions/roles and tenure; (2) keywords: excel file with a list of 72 crisis-related keywords; (3) a gzipped JSONL file per language: each line in the JSONL files represents a JSON object containing metadata about a tweet matching either one or more of the actors’ user handles and one or more of the keywords in the respective language. Additionally, semantic enrichments (i.e., entities and sentiments) calculated on the basis of the tweet text are provided. The JSON object includes the following fields:

  • tweetId: integer, unique ID for an original tweet
  • timeStamp: format ("EEE MMM dd HH:mm:ss Z yyyy"), the timestamp of the original tweet
  • userName: JSON object containing the MD5-hashed user names for private persons or the user names for public persons and institutions
  • userBio: string (available only for public users and institutions), metadata at the time point of the original tweets or of retweets
  • followers: integer, metadata at the timepoint of the original tweets or of retweets
  • followees: integer, metadata at the timepoint of the original tweets or of retweets
  • retweets: integer, metadata at the timepoint of the original tweets or of retweets
  • favorites: integer, metadata at the timepoint of the original tweets or of retweets
  • replies: integer, metadata at the timepoint of the original tweets or of retweets
  • matchingKeywords: list of strings representing the matching keywords
  • matchingUserMentions: list of strings representing the matching user mentions
  • matchingUserName: string representing the matching user name
  • sentiments: JSON object containing the output of the VADER sentiment analysis tool (available only for English, German and French)
  • entities: JSON object containing the output of Entity Fishing named entity linking tool
  • hashtags: list of strings containing the hashtags extracted from the tweet text
  • mentions: list of strings containing the user mentions extracted from the tweet text
  • urls: JSON object containing (resolved) URLs extracted from the tweet text
  • retweetId: integer, unique ID for the retweet of an original tweet with an ID captured in the tweetId field
  • retweetTimeStamp: format ("EEE MMM dd HH:mm:ss Z yyyy"), the timestamp of the retweet
  • retweetUserName: JSON object containing the MD5-hashed username of the retweeting user

Dataset version DOI Link
01.01.2018 - 31.05.2023 https://doi.org/10.7802/2985

Dataset Analysis

The Jupyter notebooks for analyzing the Tweetplomacy 23 dataset can be found with this link https://github.com/trovdimi/tweetplomacy-23.

Descriptive statistics

The table shows the percentages of tweets and users as well as the means and standard deviations for replies, retweets and favorites per crisis, language and type of user (public/private) in the dataset. The basis for calculating the statistics are 2,048,232 tweets and 914,533 users. The percentages given for tweets do not add up to exactly 100 as some tweets might cover multiple topics. Similarly, some users might talk about multiple topics in multiple languages.
Overall Energy sec. COVID-19 Climate chg. English German French Spanish Public Private
tweets (%) 100 46.20 48.96 8.02 61.49 1.42 2.69 34.40 5.17 94.83
users (%) 100 52.70 51.97 9.69 67.61 1.80 3.18 28.07 0.06 99.94
replies (M/Std) 31/684 27/482 34/840 30/486 39/860 25/199 23/231 16/172 401/2,899 11/165
retweets (M/Std) 92/829 81/682 105/953 74/688 102/1,017 29/173 52/309 80/376 712/3,064 58/437
favorites (M/Std) 310/4,271 258/3,006 358/5,098 307/4,896 411/5,371 159/1,068 159/1,293 147/1,113 3,324/17,127 146/1,654

Hashtag usage per crisis

Energy security COVID-19 Climate change
tweets with hashtags 42% 56% 53%
users using hashtags 31% 46% 45%
hashtagstweet 2.60 2.43 2.64
hashtags user 2.37 2.37 2.56

Top hashtags and mentions ranked by their occurrence counts for each crisis

Energy security
hashtag occ. mention occ.
#ukraine 42,008 @realDonaldTrump 250,202
#fanb 33,197 @NicolasMaduro 175,825
#covid19 32,378 @lopezobrador_ 140,301
#gnb 31,421 @POTUS 121,766
#tigraygenocide 21,005 @JoeBiden 63,722
COVID-19
hashtag occ. mention occ.
#covid19 164,954 @realDonaldTrump 318,982
#coronavirus 62,037 @NicolasMaduro 200,455
#fanb 28,221 @narendramodi 113,549
#covid_19 26,966 @PMOIndia 102,017
#gnb 26,556 @lopezobrador_ 101,899
Climate change
hashtag occ. mention occ.
#climatechange 13,701 @realDonaldTrump 42,256
#climateaction 10,794 @POTUS 17,280
#covid19 5,416 @NicolasMaduro 14,766
#climatecrisis 4,231 @lopezobrador_ 14,048
#climate 4,134 @JoeBiden 12,015

Top five detected entities ranked by the number of occurrences

Energy security COVID-19 Climate change
entity ID occ. entity ID occ. entity ID occ.
Q918 227,352 Q918 307,804 Q918 93,109
Q212 169,666 Q84263196 219,251 Q7942 27,602
Q11696 89,800 Q134808 81,218 Q11696 13,631
Q30 40,375 Q81068910 71,304 Q1065 12,364
Q159 40,092 Q7817 63,232 Q208645 8,058

Sentiments of tweets mentioning URLs from news media outlets

The figure shows the average compound sentiment scores for tweets sharing URLs from a qualitative selection of news outlets with different political leaning according Media Bias/Fact Check.


Dataset Extracted information Timespan Method of extraction Languages Number of postings
TweetIntent@Crisis: A Dataset Revealing Narratives of Both Sides in the Russia-Ukraine Crisis | Proceedings of the International AAAI Conference on Web and Social Media
TweetIntent@Crisis (Twitter/X) – Ai et al. (2024) Tweet ID, date, text 02/01/2022 – 02/28/2023 Keywords from tweets (after verification and topic modeling) English 17,854 (after cleaning)
IsamasRed: A Public Dataset Tracking Reddit Discussions on Israel-Hamas Conflict | Proceedings of the International AAAI Conference on Web and Social Media
IsamasRed (Reddit) – Chen et al. (2024) Conversations and comments 08/2023 – 11/2023 Keywords (after automated extraction) English 412,258 conversations; 8,089,095 comments
TweetsCOV19- A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic
TweetsCov19 (Twitter/X) – Dimitrov et al. (2020) Tweet ID, date, metadata, entities, sentiments, hashtags, mentions 10/2019 – 08/2022 Keywords from tweets (manual) English 41,307,082
JMIR Public Health and Surveillance - Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
Tracking Social Media Discourse About the COVID-19 Pandemic (Twitter/X) – Chen et al. (2020) Tweet ID, date 01/21/2020 – present Keywords from tweets, Tweets from accounts (manual) multilingual 129,911,732
Coronavirus (COVID-19) Tweets Dataset | IEEE DataPort
Coronavirus Tweets Dataset (Twitter/X) – Lamsal (2021) Tweet ID, time frame 03/20/2020 – present Keywords from tweets (manual, tokenized) English 102,650,603
SocialDrought: A Social and News Media Driven Dataset and Analytical Platform towards Understanding Societal Impact of Drought | Proceedings of the International AAAI Conference on Web and Social Media
SocialDrought (Twitter/X; online news articles; meteorological data) – Shang et al. (2024) Tweet ID, date, text, geolocation (Twitter); title, text, URL (news articles); weather indicators (meteorological records) 01/2012 – 04/2023 (Twitter); 01/2022 – 12/2023 (news articles); 01/2012 – 12/2023 (meteorological records) Keywords from tweets and news articles; analysis of weather statitics from US (manual) English 3,562,605 (tweets); 1,482 (news articles); 31,977 (meteorological records)

About Us

Team:

- Dimitar Dimitrov,
- Jan-Henrik Petermann,
- Yudong Zhang

Assert a data protection request: dimitar.dimitrov@gesis.org

L3S Research Center, University of Hannover, Germany
GESIS Web Data, Germany
GESIS – Leibniz Institute for the Social Sciences, Germany

     GESIS Leibniz Institute for the Social Sciences: Open homepage      GESIS Web Data