TeleScope is an extensive dataset suite built on Telegram . It comprises of channel metadata and message metadata downloaded from Telegram. In addition, it provide enrichments like language detection and active periods for each channel and telegram entites extracted from messages, that enable advanced analytical capabilities beyond what is possible with the original data alone. TeleScope also provided channel connections and user interaction data built using Telegram’s message-forwarding feature to study multiple use cases including information spread and message-forwarding patterns. The dataset suite is designed for diverse applications, independent of specific research objectives, and sufficiently versatile to facilitate the replication of social media studies comparable to those conducted on platforms like X (former Twitter).
The current release of TeleScope comprises of channel-metadata from about 500K Telegram channels and downloaded message-metadata from all 71K public channels within this 500k channels accounting for about 120M crawled messages.A seed list of 251 unique Telegram channels was constructed by selecting the top 100 channels based on subscribers, citations, and reach from TGStat, a widely used channel registry. Metadata and messages from these public channels were collected using the Telethon API. New channels were discovered through forwarded messages, allowing the list to be expanded iteratively.Data collection pipeline is depicted below.
Feature | Value |
---|---|
Time frame | Feb 1, 2024 – Oct 29, 2024 |
Discovered channels | 1,210,272 |
Channels with downloaded metadata | 534,137 |
Fully downloaded public channels | 71,048 |
Number of downloaded messages | 120,024,020 |
Avg. messages per channel | 1,689.33 |
Percentage of forwarded messages | 19.6% |
Avg. messages downloaded/hour | 20,495 |
Complete dataset size | 76GB (zipped) |
The data suite is organised in the following way. For details of each data set refer to the Publication section.
The first version of TeleScope can be downloaded from Archiving BASIS: DOI https://doi.org/10.7802/2825
The source code is available in a public GitHub repository, including the Telegram crawler implementation at: https://github.com/susmita3107/TeleScope .
The dataset is published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.
TeleScope does not contain actual messages. It only provides metadata extracted from channels and messages. The full message content is securely stored at the GESIS Secure Data Center (SDC).
To request access to the full messages, please contact us through this link .
Please provide your feedback and any comments by sending an email to dimitar (dot) dimitrov (at) gesis (dot) org or susmita (dot)gangopadhyay (at) gesis (dot) org
Susmita Gangopadhyay, GESIS - Leibniz Institute for the Social Sciences (Germany),
https://www.gesis.org/
Danilo Dessì, University of Sharjah, Sharjah, UAE,
https://www.sharjah.ac.ae/
Dimitar Dimitrov, GESIS - Leibniz Institute for the Social Sciences (Germany),
https://www.gesis.org/
Stefan Dietze, GESIS - Leibniz Institute for the Social Sciences (Germany),
https://www.gesis.org/ |
Heinrich Heine University Düsseldorf (Germany)
https://www.hhu.de/