The retweet prediction task in the challenge is based on the TweetsCOV19 dataset --- a publicly available dataset containing more than 8 million COVID-19-related tweets, spanning the period October 2019 to April 2020.
The COVID-19 Retweet Prediction Challenge is part of the CIKM2020 AnalytiCup, and the winners will be invited to present their solutions during the *online* AnalytiCup Workshop in October 2020.
Join the COVID-19 Retweet Prediction Challenge at CodaLab!
- 01.07.20 Contest and Phase 1 Begin (Validation Leaderboard opens)
- 15.08.20 Phase 2 Begin (Testing Leaderboard opens)
- 31.08.20 Last Shot & Contest End
- 01.09.20 Semi-Finalists Announcement (top six teams on the Testing Leaderboard)
- 01.10.20 Report & Code Due
- 20.10.20 Winners Announcement
TSV File format
Each line contains features of a tweet instance. Features are separated by tab character ("\t"). The following list indicate the feature indices:
- Tweet Id: Long. [Provided only in the training data.]
- Username: String. Encrypted for privacy issues.
- Timestamp: Format ( "EEE MMM dd HH:mm:ss Z yyyy" ).
- #Followers: Integer.
- #Friends: Integer.
- #Retweets: Integer. [The target variable to predict!]
- #Favorites: Integer.
- Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from FEL library. Each entity is separated from another entity by char ";". Also, each entity is separated by char ":" in order to store "original_text:annotated_entity:score;". If FEL did not find any entities, we have stored "null;".
- Sentiment: String. SentiStrength produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char " ". Positive sentiment was stored first and then negative sentiment (i.e. "2 -1").
- Mentions: String. If the tweet contains mentions, we remove the char "@" and concatenate the mentions with whitespace char " ". If no mentions appear, we have stored "null;".
- Hashtags: String. If the tweet contains hashtags, we remove the char "#" and concatenate the hashtags with whitespace char " ". If no hashtags appear, we have stored "null;".
- URLs: String: If the tweet contains URLs, we concatenate the URLs using ":-: ". If no URLs appear, we have stored "null;"
Given the set of features for a tweet from TweetsCOV19, the task is to predict the number of times it will be retweeted (#retweets).
Violating any competition rule specified below is ground for disqualification. In the event of any dispute in connection with the Competition, or with the interpretation or implementation of these rules, the decision of the Organizers shall be final.
Eligibility
The competition is open to everyone except for anyone involved with the organization.
One account per participant
You cannot sign up to CodaLab from multiple accounts, and therefore you cannot submit from multiple accounts.
Team size
There is no limit to the number of team members. The only restriction is that the total count of submission of all team members must be less than or equal to the maximum allowed in the respective phase of the competition.
Team mergers
Team mergers are allowed all throughout Phase 1, and can be performed by the team leader (go to your account's User Settings and indicate team name and members). In order to merge, the combined team must have a total submission count less than or equal to the maximum allowed as of the merge date. The maximum allowed is the number of submissions per day multiplied by the number of days the competition has been running. The organizers do not provide any assistance regarding the team mergers.
Additional data
Participants are free to use any additional datasets that have been made publicly available *before* the beginning of the Competition April 30, 2020.
No private sharing outside teams
Privately sharing code or data outside of teams is not permitted. It is permitted to share code if it is made available to all participants on the forums or as a public repository (e.g., Github).
Submissions
You may submit a maximum of 20 entries per day during Phase 1 (Validation Leaderboard). For Phase 2 (Testing Leaderboard), you can only submit 10 entries in total.
At the end of Phase 2, the semi-finalists--- the top six teams---are to submit their code as well as a report describing their solution (4 pages in ACM format) and make their code publicly available by the stipulated deadline.
The submitted codes and reports may be inspected to check the validity of the solution. The reports will eventually be made publicly available on the CIKM conference website.
Selected teams will also be invited to present their solutions *online* at the CIKM AnalytiCup Workshop in October 2020. To allocate the limited presentation slots, preference will be given to award-winning teams, as well as teams deemed by the organizers to have interesting or remarkable solutions.
Ethics
We trust that all used data, methods, and resources comply with the ACM code of ethics.
Winners
The ranking of entries based on the prediction score (MSLE) during Phase 2 will be used to determine the semi-finalists (top six teams), subject to the validity of the solutions. Winners will be the top 3 teams among the semi-finalist teams. A tie in the prediction score will be broken in favor of the earlier submission on the final leaderboard.
- The 1st Place receives a non-cash prize equivalent of EUR1,200 (~USD1,350)*
- The 2nd Place receives a non-cash prize equivalent of EUR800 (~USD900)*
- The 3rd Place receives a non-cash prize equivalent of EUR500 (~USD560)*
Prizes will be awarded in the form of vouchers. L3S Research Center reserves the right not to award some of or all the prizes if the competition criteria are not met.
The prize money for the challenge is provided by L3S Research Center, University of Hannover, Germany.
The complete list of sponsors include:
GESIS – Leibniz Institute for the Social Sciences, Germany
Chongqing University of Technology, China
Heinrich-Heine-University Düsseldorf, Germany
Dimitar Dimitrov, GESIS – Leibniz Institute for the Social Sciences, Germany
Xiaofei Zhu, Chongqing University of Technology, China
Questions related to the challenge?
[2] Ribeiro, M.H., Gligorić, K., Peyrard, M., Lemmerich, F., Strohmaier, M. and West, R., 2020. Sudden Attention Shifts on Wikipedia Following COVID-19 Mobility Restrictions. arXiv preprint arXiv:2005.08505.
[3] Stieglitz, S. and Dang-Xuan, L., 2012, January. Political communication and influence through microblogging--An empirical analysis of sentiment in Twitter messages and retweet behavior. In 2012 45th Hawaii International Conference on System Sciences (pp. 3500-3509). IEEE.
[4] Kim, E., Sung, Y. and Kang, H., 2014. Brand followers’ retweeting behavior on Twitter: How brand relationships influence brand electronic word-of-mouth. Computers in Human Behavior, 37, pp.18-25.
[5] Lumezanu, C., Feamster, N. and Klein, H., 2012, May. # bias: Measuring the tweeting behavior of propagandists. In Sixth International AAAI Conference on Weblogs and Social Media.
[6] Vosoughi, S., Roy, D. and Aral, S., 2018. The spread of true and false news online. Science, 359(6380), pp.1146-1151.
[7] Chung, J.E., 2017. Retweeting in health promotion: Analysis of tweets about Breast Cancer Awareness Month. Computers in Human Behavior, 74, pp.112-119.
[8] Kogan, M., Palen, L. and Anderson, K.M., 2015, February. Think local, retweet global: Retweeting by the geographically-vulnerable during Hurricane Sandy. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 981-993).