What
Tweetplomacy 23 is a semantically annotated corpus of tweets capturing digital
communicative interaction between international political leaders, peer groups and
citizens in the wake of three major global crises: (1) the increasing emphasis on the
security of energy supplies following Russia’s invasion of Ukraine; (2) the political
and geo-economic consequences of the COVID-19 pandemic; (3) the intensified debate on
the progression of climate change. These events occurred between 2018 and 2023, each of
them marking a significant shake-up of the international system.
The dataset focuses on the strategic use of networked information on X (formerly Twitter) by
executive political actors facing exogenous shocks in the context of a global crisis
situation. It is extracted from an X archive covering more than 14 billion tweets collected
from the 1% random sample API. To extract the dataset, we resort to a list of top executives
of the political administration – heads of state, heads of government, ministers of foreign
affairs – or their respective public-relations offices. Their tweets are filtered using a
list of thematically relevant keywords in four languages (English, German, French, Spanish),
reflecting the discourse with respect to the three crises mentioned above.
Our sample covers instances from the beginning of 2018 up to May 2023, representing
statements made by leading politicians from 83 countries on all continents. As a subset,
tweets published by the political leaders of the 38 member states of the OECD and the
five BRICS countries (Brazil, Russia, India, China, South Africa) have been extracted.
Additionally, the sample comprises a selection of 10 international organizations.
Why
- track and examine the repercussions/resonance produced by the ‘digital audience’ of
the most influential political leaders in the course of the three crises, thus
hinting at the political and societal impact their communicative actions had in the
digital realm.
- identify changes in sentiments, argumentation and/or tonality as well as more
general breakpoints of discussion by conducting in-depth analyses of the online
discourse relating to each of the three debates.
-
yield new insights into networks of communication among ‘online champions’ in the
diplomatic community with regard to global political crises. To this end,
researchers will be able to employ both quantitative/statistical and
qualitative/hermeneutic methodologies to further explore and compare specific
communicative motivations of national political leaders and the global ‘digital
public’ in such cases.
-
be used as a valuable empirical input not merely for political or media scientists,
but also for scholars focusing on sociological, economic or socio-psychological
aspects of crisis communication.
Dataset
The entire data collection consists of the following files: (1) users: excel file with a
list of 654 Twitter user handles (usernames) of top executives of the political
administration (and/or their institutional accounts), their nationalities,
functions/roles and tenure; (2) keywords: excel file with a list of 72 crisis-related
keywords; (3) a gzipped JSONL file per language: each line in the JSONL files represents
a JSON object
containing metadata about a tweet matching either one or more of the actors’ user
handles and one or more of the keywords in the respective language. Additionally,
semantic enrichments (i.e., entities and sentiments) calculated on the basis of the
tweet text are provided. The JSON object includes the following fields:
- tweetId: integer, unique ID for an original tweet
- timeStamp: format ("EEE MMM dd HH:mm:ss Z yyyy"), the timestamp of the original
tweet
- userName: JSON object containing the MD5-hashed user names for private persons or
the user names for public persons and institutions
- userBio: string (available only for public users and institutions), metadata at the
time point of the original tweets or of retweets
- followers: integer, metadata at the timepoint of the original tweets or of
retweets
- followees: integer, metadata at the timepoint of the original tweets or of
retweets
- retweets: integer, metadata at the timepoint of the original tweets or of retweets
- favorites: integer, metadata at the timepoint of the original tweets or of
retweets
- replies: integer, metadata at the timepoint of the original tweets or of retweets
- matchingKeywords: list of strings representing the matching keywords
- matchingUserMentions: list of strings representing the matching user mentions
- matchingUserName: string representing the matching user name
- sentiments: JSON object containing the output of the VADER sentiment analysis tool
(available only for English, German and French)
- entities: JSON object containing the output of Entity Fishing named entity linking
tool
- hashtags: list of strings containing the hashtags extracted from the tweet text
- mentions: list of strings containing the user mentions extracted from the tweet
text
- urls: JSON object containing (resolved) URLs extracted from the tweet text
- retweetId: integer, unique ID for the retweet of an original tweet with an ID
captured in the tweetId field
- retweetTimeStamp: format ("EEE MMM dd HH:mm:ss Z yyyy"), the timestamp of the
retweet
- retweetUserName: JSON object containing the MD5-hashed username of the retweeting
user
Dataset Analysis
The Jupyter notebooks for analyzing the Tweetplomacy 23 dataset can be found with this link https://github.com/trovdimi/tweetplomacy-23.
Descriptive statistics
The table shows the percentages of tweets and users as well as the means and standard
deviations
for replies, retweets and favorites per crisis, language and type of user (public/private)
in the dataset. The basis for calculating
the statistics are 2,048,232 tweets and 914,533 users. The percentages given for tweets do
not add up to exactly 100 as some
tweets might cover multiple topics. Similarly, some users might talk about multiple topics
in multiple languages.
|
Overall |
Energy sec. |
COVID-19 |
Climate chg. |
English |
German |
French |
Spanish |
Public |
Private |
| tweets (%) |
100 |
46.20 |
48.96 |
8.02 |
61.49 |
1.42 |
2.69 |
34.40 |
5.17 |
94.83 |
| users (%) |
100 |
52.70 |
51.97 |
9.69 |
67.61 |
1.80 |
3.18 |
28.07 |
0.06 |
99.94 |
| replies (M/Std) |
31/684 |
27/482 |
34/840 |
30/486 |
39/860 |
25/199 |
23/231 |
16/172 |
401/2,899 |
11/165 |
| retweets (M/Std) |
92/829 |
81/682 |
105/953 |
74/688 |
102/1,017 |
29/173 |
52/309 |
80/376 |
712/3,064 |
58/437 |
| favorites (M/Std) |
310/4,271 |
258/3,006 |
358/5,098 |
307/4,896 |
411/5,371 |
159/1,068 |
159/1,293 |
147/1,113 |
3,324/17,127 |
146/1,654 |
Hashtag usage per crisis
|
Energy security |
COVID-19 |
Climate change |
| tweets with hashtags |
42% |
56% |
53% |
| users using hashtags |
31% |
46% |
45% |
| hashtagstweet |
2.60 |
2.43 |
2.64 |
| hashtags user |
2.37 |
2.37 |
2.56 |
|
Energy security
|
| hashtag |
occ. |
mention |
occ. |
| #ukraine |
42,008 |
@realDonaldTrump |
250,202 |
| #fanb |
33,197 |
@NicolasMaduro |
175,825 |
| #covid19 |
32,378 |
@lopezobrador_ |
140,301 |
| #gnb |
31,421 |
@POTUS |
121,766 |
| #tigraygenocide |
21,005 |
@JoeBiden |
63,722 |
|
COVID-19
|
| hashtag |
occ. |
mention |
occ. |
| #covid19 |
164,954 |
@realDonaldTrump |
318,982 |
| #coronavirus |
62,037 |
@NicolasMaduro |
200,455 |
| #fanb |
28,221 |
@narendramodi |
113,549 |
| #covid_19 |
26,966 |
@PMOIndia |
102,017 |
| #gnb |
26,556 |
@lopezobrador_ |
101,899 |
|
Climate change
|
| hashtag |
occ. |
mention |
occ. |
| #climatechange |
13,701 |
@realDonaldTrump |
42,256 |
| #climateaction |
10,794 |
@POTUS |
17,280 |
| #covid19 |
5,416 |
@NicolasMaduro |
14,766 |
| #climatecrisis |
4,231 |
@lopezobrador_ |
14,048 |
| #climate |
4,134 |
@JoeBiden
| 12,015 |
|
Energy security
|
COVID-19
|
Climate change
|
| entity ID |
occ. |
entity ID |
occ. |
entity ID |
occ. |
| Q918
| 227,352 |
Q918 |
307,804
| Q918 |
93,109 |
| Q212
| 169,666 |
Q84263196 |
219,251 |
Q7942 |
27,602 |
| Q11696
| 89,800 |
Q134808 |
81,218 |
Q11696 |
13,631 |
| Q30
| 40,375 |
Q81068910 |
71,304 |
Q1065 |
12,364 |
| Q159
| 40,092 |
Q7817 |
63,232 |
Q208645 |
8,058 |
The figure shows the average compound sentiment scores for tweets sharing URLs from a qualitative selection of news outlets with different political leaning according Media Bias/Fact Check.
| Dataset |
Extracted information |
Timespan |
Method of extraction |
Languages |
Number of postings |
|
TweetIntent@Crisis: A Dataset Revealing Narratives of Both
Sides in the Russia-Ukraine Crisis | Proceedings of the International AAAI
Conference on Web and Social Media
|
| TweetIntent@Crisis
(Twitter/X)
– Ai et al. (2024)
|
Tweet ID, date, text |
02/01/2022 – 02/28/2023 |
Keywords from tweets (after verification and topic modeling) |
English |
17,854 (after cleaning) |
|
IsamasRed: A Public Dataset Tracking Reddit Discussions on Israel-Hamas
Conflict | Proceedings of the International AAAI Conference on Web and
Social Media
|
| IsamasRed
(Reddit)
– Chen et al. (2024)
|
Conversations and comments |
08/2023 – 11/2023 |
Keywords (after automated extraction) |
English |
412,258 conversations; 8,089,095 comments |
|
TweetsCOV19- A Knowledge Base of Semantically Annotated Tweets about the
COVID-19 Pandemic
|
| TweetsCov19
(Twitter/X) – Dimitrov et al. (2020)
|
Tweet ID, date, metadata, entities, sentiments, hashtags, mentions
|
10/2019 – 08/2022 |
Keywords from tweets (manual) |
English |
41,307,082 |
|
JMIR Public Health and Surveillance - Tracking Social Media Discourse About
the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
|
| Tracking Social Media Discourse About the COVID-19 Pandemic
(Twitter/X)
– Chen et al. (2020)
|
Tweet ID, date |
01/21/2020 – present
|
Keywords from tweets, Tweets from accounts (manual) |
multilingual |
129,911,732 |
|
Coronavirus (COVID-19) Tweets Dataset | IEEE DataPort
|
| Coronavirus Tweets Dataset
(Twitter/X)
– Lamsal (2021)
|
Tweet ID, time frame |
03/20/2020 – present
|
Keywords from tweets (manual, tokenized) |
English |
102,650,603 |
|
SocialDrought: A Social and News Media Driven Dataset and
Analytical Platform towards Understanding Societal Impact of Drought |
Proceedings of the International AAAI Conference on Web and Social Media
|
| SocialDrought
(Twitter/X; online news articles; meteorological data)
– Shang et al. (2024)
|
Tweet ID, date, text, geolocation (Twitter);
title, text, URL (news articles);
weather indicators (meteorological records)
|
01/2012 – 04/2023 (Twitter); 01/2022 – 12/2023
(news articles); 01/2012 – 12/2023 (meteorological records)
|
Keywords from tweets and news articles; analysis of weather statitics from US
(manual)
|
English |
3,562,605 (tweets);
1,482 (news articles); 31,977 (meteorological records)
|
About Us