As a result of the ongoing Coronavirus disease 2019 (COVID-19) pandemic, our daily
life routines and
behavior patterns changed drastically, not only offline but also online. One example
of such a
change is the change in the reading patterns on Wikipedia and Reddit [1,2]. COVID-19
has also been a
hot topic on other social media platforms such as Facebook, Twitter, or YouTube.
To understand the information spreading mechanisms during the COVID-19 pandemic, in
this challenge,
we focus on Twitter. Twitter is an online social network where users can follow each
other and share
information using short text posts called tweets. The platform offers a function to
retweet a tweet,
which means sharing it with your followers without any change. Retweeting is a
popular function, and
it has also found his way in other online social networks such as Weibo. Retweeting
can be seen as
amplifying the spread of original content, and thus retweet prediction is a crucial
task when
studying information spreading processes. As such, understanding retweet behavior is
useful and has
many practical applications, e.g., political audience design [3,4], fake news
spreading and tracking
[5,6], health promotion [7], mass emergency management [8], etc. Modeling retweet
behavior has been
an active research area and is also especially important during times of crisis,
such as the current
COVID-19 pandemic.
The retweet prediction task in the challenge is based on the TweetsCOV19 dataset --- a
publicly available
dataset containing more than 8 million COVID-19-related tweets, spanning the period
October 2019 to
April 2020.
The COVID-19 Retweet Prediction Challenge is part of the CIKM2020 AnalytiCup, and
the winners will
be invited to present their solutions during the *online* AnalytiCup Workshop in
October 2020.
Join the
COVID-19 Retweet
Prediction Challenge at CodaLab!
In this competition, you are provided with the TweetsCOV19, a publicly available
dataset of more
than 8 million tweets, spanning the period Oct’19-Apr’20.
TSV File format
Each line contains features of a tweet instance. Features are separated by tab
character ("\t"). The
following list indicate the feature indices:
Given the set of features for a tweet from TweetsCOV19, the task is to predict the number of times it will be retweeted (#retweets).
Violating any competition rule specified below is ground for disqualification. In the event of any dispute in connection with the Competition, or with the interpretation or implementation of these rules, the decision of the Organizers shall be final.
The competition is open to everyone except for anyone involved with the organization.
You cannot sign up to CodaLab from multiple accounts, and therefore you cannot submit from multiple accounts.
There is no limit to the number of team members. The only restriction is that the total count of submission of all team members must be less than or equal to the maximum allowed in the respective phase of the competition.
Team mergers are allowed all throughout Phase 1, and can be performed by the team leader (go to your account's User Settings and indicate team name and members). In order to merge, the combined team must have a total submission count less than or equal to the maximum allowed as of the merge date. The maximum allowed is the number of submissions per day multiplied by the number of days the competition has been running. The organizers do not provide any assistance regarding the team mergers.
Participants are free to use any additional datasets that have been made publicly available *before* the beginning of the Competition April 30, 2020.
Privately sharing code or data outside of teams is not permitted. It is permitted to share code if it is made available to all participants on the forums or as a public repository (e.g., Github).
You may submit a maximum of 20 entries per day during Phase 1 (Validation Leaderboard). For Phase 2 (Testing Leaderboard), you can only submit 10 entries in total.
At the end of Phase 2, the semi-finalists--- the top six teams---are to
submit their
code as well as a report describing their solution (4 pages in ACM
format) and make
their code publicly available by the stipulated deadline.
The submitted codes and reports may be inspected to check the validity of the solution. The reports will eventually be made publicly available on the CIKM conference website.
Selected teams will also be invited to present their solutions *online* at the CIKM AnalytiCup Workshop in October 2020. To allocate the limited presentation slots, preference will be given to award-winning teams, as well as teams deemed by the organizers to have interesting or remarkable solutions.
We trust that all used data, methods, and resources comply with the ACM code of ethics.
The ranking of entries based on the prediction score (MSLE) during Phase 2 will be used to determine the semi-finalists (top six teams), subject to the validity of the solutions. Winners will be the top 3 teams among the semi-finalist teams. A tie in the prediction score will be broken in favor of the earlier submission on the final leaderboard.
The winners of the COVID-19 Retweet Prediction Challenge await non-cash prizes worth 2.500€ provided by L3S Research Center, University of Hannover, Germany. The prizes will be distributed among the participants as follows:
*In order to be eligible for any award, the semi-finalists are required to
submit the code
and solution report (4 pages in ACM format) to the organizers by the
stipulated deadline.
The submitted codes and reports may be inspected to check the validity of
the solution. The
reports will eventually be made publicly available on the CIKM conference
website.
Prizes will be awarded in the form of vouchers. L3S Research Center reserves
the right not
to award some of or all the prizes if the competition criteria are not met.
The prize money for the challenge is provided by L3S
Research
Center, University of Hannover, Germany.
The complete list of sponsors include:
GESIS
– Leibniz Institute for the Social Sciences, Germany
Chongqing
University of Technology, China
Heinrich-Heine-University
Düsseldorf,
Germany
>
The challenge is part of the CIKM2020
AnalytiCup
Dimitar Dimitrov, GESIS – Leibniz
Institute for the
Social Sciences, Germany
Xiaofei Zhu, Chongqing University of Technology, China
Questions related
to the
challenge?
[1] Gozzi, N., Tizzani, M., Starnini, M., Ciulla, F., Paolotti, D., Panisson, A.
and Perra, N.,
2020. Collective response to the media coverage of COVID-19 Pandemic on Reddit
and Wikipedia.
arXiv preprint arXiv:2006.06446.
[2] Ribeiro, M.H., Gligorić, K., Peyrard, M., Lemmerich, F., Strohmaier, M. and
West, R., 2020.
Sudden Attention Shifts on Wikipedia Following COVID-19 Mobility Restrictions.
arXiv preprint
arXiv:2005.08505.
[3] Stieglitz, S. and Dang-Xuan, L., 2012, January. Political communication and
influence
through microblogging--An empirical analysis of sentiment in Twitter messages
and retweet
behavior. In 2012 45th Hawaii International Conference on System Sciences (pp.
3500-3509). IEEE.
[4] Kim, E., Sung, Y. and Kang, H., 2014. Brand followers’ retweeting behavior
on Twitter: How
brand relationships influence brand electronic word-of-mouth. Computers in Human
Behavior, 37,
pp.18-25.
[5] Lumezanu, C., Feamster, N. and Klein, H., 2012, May. # bias: Measuring the
tweeting behavior
of propagandists. In Sixth International AAAI Conference on Weblogs and Social
Media.
[6] Vosoughi, S., Roy, D. and Aral, S., 2018. The spread of true and false news
online. Science,
359(6380), pp.1146-1151.
[7] Chung, J.E., 2017. Retweeting in health promotion: Analysis of tweets about
Breast Cancer
Awareness Month. Computers in Human Behavior, 74, pp.112-119.
[8] Kogan, M., Palen, L. and Anderson, K.M., 2015, February. Think local,
retweet global:
Retweeting by the geographically-vulnerable during Hurricane Sandy. In
Proceedings of the 18th
ACM conference on computer supported cooperative work & social computing
(pp. 981-993).