Description

TeleScope is an extensive dataset suite built on Telegram . It comprises of channel metadata and message metadata downloaded from Telegram. In addition, it provide enrichments like language detection and active periods for each channel and telegram entites extracted from messages, that enable advanced analytical capabilities beyond what is possible with the original data alone. TeleScope also provided channel connections and user interaction data built using Telegram’s message-forwarding feature to study multiple use cases including information spread and message-forwarding patterns. The dataset suite is designed for diverse applications, independent of specific research objectives, and sufficiently versatile to facilitate the replication of social media studies comparable to those conducted on platforms like X (former Twitter).

The current release of TeleScope comprises of channel-metadata from about 500K Telegram channels and downloaded message-metadata from all 71K public channels within this 500k channels accounting for about 120M crawled messages.

Data Collection

A seed list of 251 unique Telegram channels was constructed by selecting the top 100 channels based on subscribers, citations, and reach from TGStat, a widely used channel registry. Metadata and messages from these public channels were collected using the Telethon API. New channels were discovered through forwarded messages, allowing the list to be expanded iteratively.Data collection pipeline is depicted below.

Data Crawling Diagram

Statistics

The statistics for the latest release are depicted below
Feature Value
Time frame Feb 1, 2024 – Oct 29, 2024
Discovered channels 1,210,272
Channels with downloaded metadata 534,137
Fully downloaded public channels 71,048
Number of downloaded messages 120,024,020
Avg. messages per channel 1,689.33
Percentage of forwarded messages 19.6%
Avg. messages downloaded/hour 20,495
Complete dataset size 76GB (zipped)

Dataset Organisation

The data suite is organised in the following way. For details of each data set refer to the Publication section.

Data Suite

The first version of TeleScope can be downloaded from Archiving BASIS: DOI https://doi.org/10.7802/2825

Use Cases

  • Replication of Social Media Research: Telegram's forwarding and reaction data enables adaptation of Twitter-based methodologies for studying diffusion, virality, engagement, and sentiment.
  • Network and Community Discovery: Channel-to-channel graphs and message forwarding flows allow for analysis of communities, hubs, and information dissemination across Telegram.
  • Entity-based Search and Exploration: Hashtags, mentions, and URLs enrich searchability and support targeted research on themes like political discourse, trends, and external influence.
  • Data Source for Low-resource Language: The dataset includes 47 languages, offering rare access to underrepresented languages for NLP and sociolinguistic research.

Source code

The source code is available in a public GitHub repository, including the Telegram crawler implementation at: https://github.com/susmita3107/TeleScope .

License

The dataset is published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.

Data Access

TeleScope does not contain actual messages. It only provides metadata extracted from channels and messages. The full message content is securely stored at the GESIS Secure Data Center (SDC).

To request access to the full messages, please contact us through this link .

Publications

  • Gangopadhyay, S., Dessi, D., Dimitrov, D., Dietze, S., TeleScope: A Longitudinal Dataset for Investigating Online Discourse and Information Interaction on Telegram, International AAAI Conference on Web and Social Media (ICWSM), Copenhagen, Denmark, June 2025. [Paper Link] [Link to the Poster]

Contact

Please provide your feedback and any comments by sending an email to dimitar (dot) dimitrov (at) gesis (dot) org or susmita (dot)gangopadhyay (at) gesis (dot) org

About Us

Susmita Gangopadhyay, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Danilo Dessì, University of Sharjah, Sharjah, UAE, https://www.sharjah.ac.ae/
Dimitar Dimitrov, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Stefan Dietze, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/ | Heinrich Heine University Düsseldorf (Germany) https://www.hhu.de/

GESIS Leibniz Institute for the Social Sciences      GESIS Web Data