Go to GESIS-Homepage
Go to homepage

ClaimsKG

Description

ClaimsKG is a structured database which serves as a registry of claims. It provides an entry point for researchers to discover claims and involved entities, also providing links to fact-checking sites and their results. Basis of the database is a knowledge graph which provides data about claims, metadata (such as their publishing site), involved entities (which we have annotated using state-of-the-art NLP techniques) and some normalized truth ratings. ClaimsKG is generated through a (semi-)automated pipeline which harvests claims and respective metadata from popular fact-checking sites on a regular basis, lifts data into an RDF/S model, which exploits established schema such as schema.org and NIF, and annotates claims with related entities from DBpedia.

Claim extraction

We just use websites considered by the fact checking community as highly reputable (see here for details).

Fact-Checking Websites

We have taken measures to ensure that our data is in alignment with the copyright restrictions of the resepctive fact-checking websites.

The latest release of ClaimsKG covers 74066 claims and 72128 claim reviews. The data was scraped in January of 2023 containing claims published between the years 1996-2023(Jan 31) from 13 factchecking websites mentioned below.The claim-review (fact checking) period for claims ranges between the year 1996 to 2023. Entity fishing python client(https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. The dataset contains entities detected and referenced with DBpedia.

We plan to add more websites in diverse languages in the upcoming versions

Features extracted

(a) the textual statement of the claim;
(b) its truth value or rating - both a normalized rating and the original one;
(c) a link to the claim review from the fact-checking website;
(d) the references cited in the claim reviews;
(e) the entities extracted from the claim body and from the review body ;
(f) the author of the claim and the author of the claim review;
(g) the date of publication of the claim and the date of publication of the review;
(h) the title of the review article;
(i) a set of keywords extracted from the fact-checking websites that act like topics (e.g., “healthcare” or “abortion”)

Data model

In the following figure, the data model of ClaimsKG is illustrated.

In the following figure, an instanciated version of the model for an example is shown.

The mappings of local review ratings to normalised ratings are documented here

Statistics

Property Global AFP Factcheck Africacheck Check your Fact Fullfact Politifact Snopes Truth or Fiction AFP Factuel (FR) Factograph Fatabyyano Vishva news Polygraph Eu factcheck
Number of claims 74066 6661 4575 4877 4675 22351 18764 3103 1732 258 1550 4036 1190 234
Number of Reviews 72127 6662 3680 4878 3282 22373 18818 3103 1756 266 1551 4269 1208 279
Claim text 97.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 107.39% 97.28% 0.00% 100% 100% 100%
Claim author 99% 100% 100.00% 100% 100% 100% 100% 100% 98.49% 99.61% 100% 100% 100% 0%
Claim date published 65.36% 86.29% 0.00% 0% 0% 100.00% 0.00% 0.00% 107.15.00% 0.00% 0.00% 0.00% 0% 0.00%
Claim keywords 238422 7011 12867 15449 7597 88419 55831 21596 9 257 0 27662 1467 257
Claim review URL 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Claim review headline 99.89% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 98.55% 100.00% 100%
Claim review without author 6915 0 0 0 0 0 0 0 154 9 0 6454 19 279
Total Number of entities 3408390 370827 160318 219532 272281 1492388 648432 250948 85395 280 279 16809 32534 8679
Entities per review 62.21 53.82 41.31 41.18 80.59 64.70 32.04 78.468 45.42 0.31 0 1.90 21.96 27.97
Entities per claim 2.38 1.84 1.80 3.82 1.65 1.99 2.41 2.40 3.24 0.763 0 2.14 5.03 2.87
True claims 8935 2 516 234 694 2523 3879 958 6 20 39 0 20 44
False claims 43371 6415 3403 4533 1161 9058 9342 1131 1227 74 928 3446 717 44
Mixture claims 16380 139 13 4 217 10502 3071 633 237 157 3 823 398 151
Other claims 7804 106 658 107 2625 290 2526 381 390 15 581 0 85 40

Web interfaces

Claim Explorer: https://data.gesis.org/claimskg/explorer

Dataset

Dataset files

The latest release of ClaimsKG can be downloaded from Datorium: DOI https://doi.org/10.7802/2620

Previous versions are available at : https://zenodo.org/record/3518960 and https://doi.org/10.7802/2469

Sparql endpoint

A SPARQL endpoint is available to send SPARQL queries and retrieve results from ClaimsKG.

https://data.gesis.org/claimskg/sparql

Example queries

Example 1: Requesting the top-5 entities mentioned in claims together with “Coronavirus”. (Result)

PREFIX itsrdf:<https://www.w3.org/2005/11/its/rdf#>
		PREFIX schema:<http://schema.org/>
		PREFIX dbr:<http://dbpedia.org/resource/> 
		PREFIX dbo:<http://dbpedia.org/ontology/>
		PREFIX nee:<http://www.ics.forth.gr/isl/oae/core#>
		PREFIX dc:<http://purl.org/dc/terms/> 
		SELECT ?entity2Uri count(?entity2) AS ?num WHERE {
		?claim a schema:CreativeWork ; schema:text ?text ; schema:mentions ?entity1, ?entity2 . 
		?entity1 itsrdf:taIdentRef dbr:Coronavirus . 
		?entity2 itsrdf:taIdentRef ?entity2Uri FILTER (?entity2Uri != dbr:Coronavirus) 
		} GROUP BY ?entity2Uri ORDER BY DESC(?num) LIMIT 5 
		

Example 2: Requesting all claims of 2022 mentioning both Vladimir Putin and Ukraine. (Result)

PREFIX itsrdf:<https://www.w3.org/2005/11/its/rdf#>
		PREFIX schema:<http://schema.org/>
		PREFIX dbr:<http://dbpedia.org/resource/>
		SELECT ?text ?date ?reviewurl WHERE { 
		?claim a schema:CreativeWork ; schema:datePublished ?date FILTER(year(?date)=2022) 
		?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity1, ?entity2 . 
		?entity1 itsrdf:taIdentRef dbr:Vladimir_Putin . 
		?entity2 itsrdf:taIdentRef dbr:Ukraine . 
		?claimReview schema:itemReviewed ?claim ;schema:url ?reviewurl } 
		

Example 3: Requesting number of claims year wise mentioning Coronavirus . (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
		PREFIX schema: <http://schema.org/>
		PREFIX dbr: <http://dbpedia.org/resource/>
		PREFIX dbo: <http://dbpedia.org/ontology/>
		PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
		PREFIX dc: <http://purl.org/dc/terms/>
		SELECT ?year (count(distinct ?text) as ?count){ 
		?claim a schema:CreativeWork ; schema:text ?text ; schema:datePublished ?date BIND (year(?date) AS ?year)FILTER (?year >= 2019) . 
		?claim  schema:mentions ?entity . 
		?entity itsrdf:taIdentRef dbr:Coronavirus. 
		} ORDER BY desc(?count)
		

Example 4: Requesting the number of claims per month mentioning Donald Trump in 2020. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
		PREFIX schema: <http://schema.org/>
		PREFIX dbr: <http://dbpedia.org/resource/>
		PREFIX dbo: <http://dbpedia.org/ontology/>
		PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
		PREFIX dc: <http://purl.org/dc/terms/>
		SELECT month(?date) as ?month count(?claim) as ?num WHERE {
		?claim a schema:CreativeWork ; schema:datePublished ?date FILTER(year(?date)=2020)
		?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity .
		?entity itsrdf:taIdentRef dbr:Donald_Trump .
		} GROUP BY month(?date) ORDER BY month(?date)
		

Example 5:Requesting all claims mentioning President of the United States across all times. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
		PREFIX schema: <http://schema.org/>
		PREFIX dbr: <http://dbpedia.org/resource/>
		PREFIX dbo: <http://dbpedia.org/ontology/>
		PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
		PREFIX dc: <http://purl.org/dc/terms/>
		SELECT DISTINCT ?text ?reviewurl ?President WHERE { 

		SERVICE <http://dbpedia.org/sparql> 
		{
   
			?President <http://dbpedia.org/property/office> "President of the United States"@en .
 
   
		}
		?claim a schema:CreativeWork . 
		?claimReview schema:itemReviewed ?claim ;schema:url ?reviewurl .
		?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity1 . 
		?entity1 <https://www.w3.org/2005/11/its/rdf#taIdentRef> ?President.

		}
		

Example 6:Requesting all claims mentioning Military Conflict. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
		PREFIX schema: <http://schema.org/>
		PREFIX dbr: <http://dbpedia.org/resource/>
		PREFIX dbo: <http://dbpedia.org/ontology/>
		PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
		PREFIX dc: <http://purl.org/dc/terms/>
		SELECT DISTINCT ?text  ?reviewurl ?c WHERE { 

		SERVICE <http://dbpedia.org/sparql>
		{
   
			?b <http://dbpedia.org/ontology/isPartOfMilitaryConflict> ?c .
 
   
		}
		?claim a schema:CreativeWork . 
		?claimReview schema:itemReviewed ?claim ;schema:url ?reviewurl .
		?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity1 . 
		?entity1 <https://www.w3.org/2005/11/its/rdf#taIdentRef> ?c.

		} 
		

Source code

The source code of all components of ClaimsKG is available on GitHub.

Extractor: A pipeline for web scraping for the fact-checking websites.

Source Code: https://github.com/claimskg/claimskg-extractor/tree/latest_release

Generator: A pipeline for harvesting the scraped data, annotating the claims to DBpedia entities,and lifting all data to an RDF model.

Source Code : https://github.com/claimskg/claimskg_generator/tree/latest_release

License

The dataset is published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.

ClaimsKG does NOT contain actual review texts, it only contains structured metadata information, and links to the original review of the fact-checking site. ClaimsKG can be used ONLY for research purposes.

Publications

Contact

Please provide your feedback and any comments by sending an email to susmita (dot)gangopadhyay (at) gesis (dot) org or andon (dot) tchechmedjiev (at) mines-ales (dot) fr

About Us

Susmita Gangopadhyay, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Katarina Boland, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Hajira Jabeen, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Darlène Bretchel, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Stefan Dietze, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Pavlos Fafalios, Institute of Computer Science, FORTH-ICS (Greece), https://www.ics.forth.gr/
Malo Gasquet, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Andon Tchechmedjiev, LGI2P / IMT Mines Ales / University of Montpellier (France), https://lgi2p.mines-ales.fr/
Konstantin Todorov, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Vinicius Woloszyn, TU Berlin (Germany), https://www.tu-berlin.de
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Matthaeus Zloch, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/