ClaimsKG is a structured database which serves as a registry of claims. It provides an entry point for researchers to discover claims and involved entities, also providing links to fact-checking sites and their results. Basis of the database is a knowledge graph which provides data about claims, metadata (such as their publishing site), involved entities (which we have annotated using state-of-the-art NLP techniques) and some normalized truth ratings. ClaimsKG is generated through a (semi-)automated pipeline which harvests claims and respective metadata from popular fact-checking sites on a regular basis, lifts data into an RDF/S model, which exploits established schema such as schema.org and NIF, and annotates claims with related entities from DBpedia.
We just use websites considered by the fact checking community as highly reputable (see here for details).
We have taken measures to ensure that our data is in alignment with the copyright restrictions of the resepctive fact-checking websites.
The latest release of ClaimsKG covers 74066 claims and 72128 claim reviews. The data was scraped in January of 2023 containing claims published between the years 1996-2023(Jan 31) from 13 factchecking websites mentioned below.The claim-review (fact checking) period for claims ranges between the year 1996 to 2023. Entity fishing python client(https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. The dataset contains entities detected and referenced with DBpedia.
We plan to add more websites in diverse languages in the upcoming versions
(a) the textual statement of the claim;
(b) its truth value or rating - both a
normalized rating and the original one;
(c) a link to the claim review from the
fact-checking website;
(d) the references cited in the claim reviews;
(e) the
entities extracted from the claim body and from the review body ;
(f) the author of
the claim and the author of the claim review;
(g) the date of publication of the
claim and the date of publication of the review;
(h) the title of the review article;
(i)
a set of keywords extracted from the fact-checking websites that act like topics (e.g.,
“healthcare” or “abortion”)
In the following figure, the data model of ClaimsKG is illustrated.
In the following figure, an instanciated version of the model for an example is shown.
The mappings of local review ratings to normalised ratings are documented here
Property | Global | AFP Factcheck | Africacheck | Check your Fact | Fullfact | Politifact | Snopes | Truth or Fiction | AFP Factuel (FR) | Factograph | Fatabyyano | Vishva news | Polygraph | Eu factcheck |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number of claims | 74066 | 6661 | 4575 | 4877 | 4675 | 22351 | 18764 | 3103 | 1732 | 258 | 1550 | 4036 | 1190 | 234 |
Number of Reviews | 72127 | 6662 | 3680 | 4878 | 3282 | 22373 | 18818 | 3103 | 1756 | 266 | 1551 | 4269 | 1208 | 279 |
Claim text | 97.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 107.39% | 97.28% | 0.00% | 100% | 100% | 100% |
Claim author | 99% | 100% | 100.00% | 100% | 100% | 100% | 100% | 100% | 98.49% | 99.61% | 100% | 100% | 100% | 0% |
Claim date published | 65.36% | 86.29% | 0.00% | 0% | 0% | 100.00% | 0.00% | 0.00% | 107.15.00% | 0.00% | 0.00% | 0.00% | 0% | 0.00% |
Claim keywords | 238422 | 7011 | 12867 | 15449 | 7597 | 88419 | 55831 | 21596 | 9 | 257 | 0 | 27662 | 1467 | 257 |
Claim review URL | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% |
Claim review headline | 99.89% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 98.55% | 100.00% | 100% |
Claim review without author | 6915 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 154 | 9 | 0 | 6454 | 19 | 279 |
Total Number of entities | 3408390 | 370827 | 160318 | 219532 | 272281 | 1492388 | 648432 | 250948 | 85395 | 280 | 279 | 16809 | 32534 | 8679 |
Entities per review | 62.21 | 53.82 | 41.31 | 41.18 | 80.59 | 64.70 | 32.04 | 78.468 | 45.42 | 0.31 | 0 | 1.90 | 21.96 | 27.97 |
Entities per claim | 2.38 | 1.84 | 1.80 | 3.82 | 1.65 | 1.99 | 2.41 | 2.40 | 3.24 | 0.763 | 0 | 2.14 | 5.03 | 2.87 |
True claims | 8935 | 2 | 516 | 234 | 694 | 2523 | 3879 | 958 | 6 | 20 | 39 | 0 | 20 | 44 |
False claims | 43371 | 6415 | 3403 | 4533 | 1161 | 9058 | 9342 | 1131 | 1227 | 74 | 928 | 3446 | 717 | 44 |
Mixture claims | 16380 | 139 | 13 | 4 | 217 | 10502 | 3071 | 633 | 237 | 157 | 3 | 823 | 398 | 151 |
Other claims | 7804 | 106 | 658 | 107 | 2625 | 290 | 2526 | 381 | 390 | 15 | 581 | 0 | 85 | 40 |
Claim Explorer: https://data.gesis.org/claimskg/explorer
The latest release of ClaimsKG can be downloaded from Datorium: DOI https://doi.org/10.7802/2620
Previous versions are available at : https://zenodo.org/record/3518960 and https://doi.org/10.7802/2469
A SPARQL endpoint is available to send SPARQL queries and retrieve results from ClaimsKG.
https://data.gesis.org/claimskg/sparql
Example 1: Requesting the top-5 entities mentioned in claims together with “Coronavirus”. (Result)
PREFIX itsrdf:<https://www.w3.org/2005/11/its/rdf#>
PREFIX schema:<http://schema.org/>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX nee:<http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc:<http://purl.org/dc/terms/>
SELECT ?entity2Uri count(?entity2) AS ?num WHERE {
?claim a schema:CreativeWork ; schema:text ?text ; schema:mentions ?entity1, ?entity2 .
?entity1 itsrdf:taIdentRef dbr:Coronavirus .
?entity2 itsrdf:taIdentRef ?entity2Uri FILTER (?entity2Uri != dbr:Coronavirus)
} GROUP BY ?entity2Uri ORDER BY DESC(?num) LIMIT 5
Example 2: Requesting all claims of 2022 mentioning both Vladimir Putin and Ukraine. (Result)
PREFIX itsrdf:<https://www.w3.org/2005/11/its/rdf#>
PREFIX schema:<http://schema.org/>
PREFIX dbr:<http://dbpedia.org/resource/>
SELECT ?text ?date ?reviewurl WHERE {
?claim a schema:CreativeWork ; schema:datePublished ?date FILTER(year(?date)=2022)
?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity1, ?entity2 .
?entity1 itsrdf:taIdentRef dbr:Vladimir_Putin .
?entity2 itsrdf:taIdentRef dbr:Ukraine .
?claimReview schema:itemReviewed ?claim ;schema:url ?reviewurl }
Example 3: Requesting number of claims year wise mentioning Coronavirus . (Result)
PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?year (count(distinct ?text) as ?count){
?claim a schema:CreativeWork ; schema:text ?text ; schema:datePublished ?date BIND (year(?date) AS ?year)FILTER (?year >= 2019) .
?claim schema:mentions ?entity .
?entity itsrdf:taIdentRef dbr:Coronavirus.
} ORDER BY desc(?count)
Example 4: Requesting the number of claims per month mentioning Donald Trump in 2020. (Result)
PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT month(?date) as ?month count(?claim) as ?num WHERE {
?claim a schema:CreativeWork ; schema:datePublished ?date FILTER(year(?date)=2020)
?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity .
?entity itsrdf:taIdentRef dbr:Donald_Trump .
} GROUP BY month(?date) ORDER BY month(?date)
Example 5:Requesting all claims mentioning President of the United States across all times. (Result)
PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT DISTINCT ?text ?reviewurl ?President WHERE {
SERVICE <http://dbpedia.org/sparql>
{
?President <http://dbpedia.org/property/office> "President of the United States"@en .
}
?claim a schema:CreativeWork .
?claimReview schema:itemReviewed ?claim ;schema:url ?reviewurl .
?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity1 .
?entity1 <https://www.w3.org/2005/11/its/rdf#taIdentRef> ?President.
}
Example 6:Requesting all claims mentioning Military Conflict. (Result)
PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT DISTINCT ?text ?reviewurl ?c WHERE {
SERVICE <http://dbpedia.org/sparql>
{
?b <http://dbpedia.org/ontology/isPartOfMilitaryConflict> ?c .
}
?claim a schema:CreativeWork .
?claimReview schema:itemReviewed ?claim ;schema:url ?reviewurl .
?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity1 .
?entity1 <https://www.w3.org/2005/11/its/rdf#taIdentRef> ?c.
}
The source code of all components of ClaimsKG is available on GitHub.
Extractor: A pipeline for web scraping for the fact-checking websites.
Source Code: https://github.com/claimskg/claimskg-extractor/tree/latest_release
Generator: A pipeline for harvesting the scraped data, annotating the claims to DBpedia entities,and lifting all data to an RDF model.
Source Code : https://github.com/claimskg/claimskg_generator/tree/latest_release
The dataset is published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.
ClaimsKG does NOT contain actual review texts, it only contains structured metadata information, and links to the original review of the fact-checking site. ClaimsKG can be used ONLY for research purposes.
Please provide your feedback and any comments by sending an email to susmita (dot)gangopadhyay (at) gesis (dot) org or andon (dot) tchechmedjiev (at) mines-ales (dot) fr
Susmita Gangopadhyay, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Katarina Boland, GESIS - Leibniz
Institute for the Social Sciences (Germany), https://www.gesis.org/
Hajira
Jabeen, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Darlène
Bretchel, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Stefan
Dietze, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Pavlos
Fafalios, Institute of Computer Science, FORTH-ICS (Greece), https://www.ics.forth.gr/
Malo
Gasquet, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Andon
Tchechmedjiev, LGI2P / IMT Mines Ales / University of Montpellier (France), https://lgi2p.mines-ales.fr/
Konstantin
Todorov, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Vinicius
Woloszyn, TU Berlin (Germany), https://www.tu-berlin.de
Benjamin
Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Matthaeus
Zloch, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/