Description - Data model - Statistics- Web interface - SPARQL endpoint - Source Code - License - Contact - About

Description

ClaimsKG is a structured database which serves as a registry of claims. It provides an entry point for researchers to discover claims and involved entities, also providing links to fact-checking sites and their results. Basis of the database is a knowledge graph which provides data about claims, metadata (such as their publishing site), involved entities (which we have annotated using state-of-the-art NLP techniques) and some normalized truth ratings. ClaimsKG is generated through a (semi-)automated pipeline which harvests claims and respective metadata from popular fact-checking sites on a regular basis, lifts data into an RDF/S model, which exploits established schema such as schema.org and NIF, and annotates claims with related entities from DBpedia.

Claim extraction

We just use websites considered by the fact checking community as highly reputable (see here for details).

We have taken measures to ensure that our data is in alignment with the copyright restrictions of the resepctive fact-checking websites.

Features extracted

(a) the textual statement of the claim;
(b) its truth value or rating - both a normalized rating and the original one;
(c) a link to the claim review from the fact-checking website;
(d) the references cited in the claim reviews;
(e) the entities extracted from the claim body and from the review body together with their Wikipedia categories;
(f) the author of the claim and the author of the claim review;
(g) the date of publication of the claim and the date of publication of the review;
(h) the title of the review article;
(i) a set of keywords extracted from the fact-checking websites that act like topics (e.g., “healthcare” or “abortion”)

Data model

In the following figure, the data model of ClaimsKG is illustrated.

In the following figure, an instanciated version of the model for an example is shown.

Statistics

Property Global Snopes Politifact Africa Check Truth or Fiction Check your Fact FactsCan Fullfact AFP Factcheck AFP Factuel (FR)
Number of claims 33,261 12,812 16,476 520 1,311 823 125 250 678 266
Claim text 99.10% 99.98% 100.00% 100.00% 100.00% 100.00% 100.00% 0.00% 100.00% 100.00%
Claim author 44.57% 0.00% 100.00% 0.00% 0.00% 0.00% 100.00% 39.54% 82.32% 62.40%
Claim date published 44.74% 0.00% 99.91% 0.00% 0.00% 0.00% 98.40% 0.00% 100.00% 100.00%
Claim with citation 81.00% 82.32% 76.95% 96.97% 100.00% 99.76% 100.00% 0.00% 99.55% 95.48%
Claim keywords 77.10% 67.47% 100.00% 99.88% 0.00% 0.00% 100.00% 100.00% 0.00% 0.00%
Claim entity mention 98.82% 99.66% 99.98% 98.56% 100.00% 100.00% 100.00% 0.00% 100.00% 100.00%
Claim review URL 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Claim review headline 99.11% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 0.00% 100.00% 100.00%
Claim review author 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Claim review date published 77.78% 100.00% 0.00% 100.00% 0.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Claim review language 100.00% 100.00% 0.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Claim review entity mention 62.95% 67.98% 55.83% 72.49% 73.32% 64.92% 96.00% 43.55% 74.10% 53.13%
Claim review language 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Identical claims 87 38 49 0 0 0 0 0 0 0
True claims 4,404 1,846 2,334 60 125 0 39 0 0 0
False claims 12,350 6,809 5,036 211 246 0 48 0 0 0
Mixture claims 10,834 1,925 8,844 0 63 2 0 0 0 0
Other claims 5,706 2,232 262 282 877 821 38 250 678 266

Web interfaces

Claim Explorer: https://data.gesis.org/claimskg/explorer

ClaimsKG Statistical Observatory: https://data.gesis.org/claimskg/observatory

SPARQL endpoint

A SPARQL endpoint is available to send SPARQL queries and retrieve results from ClaimsKG.

https://data.gesis.org/claimskg/sparql

Example queries

Example 1: Requesting false claims of 2017 mentioning both Donald Trump and the FBI. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?text ?date ?reviewurl WHERE {
 ?claim a schema:CreativeWork ; schema:datePublished ?date FILTER(year(?date)=2017)
 ?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity1, ?entity2 .
 ?entity1 itsrdf:taIdentRef dbr:Federal_Bureau_of_Investigation .
 ?entity2 itsrdf:taIdentRef dbr:Donald_Trump .
 ?claimReview schema:itemReviewed ?claim ; schema:reviewRating ?rating ; schema:url ?reviewurl .
 ?rating schema:author <http://data.gesis.org/claimskg/organization/claimskg> ;
     schema:alternateName ?ratingName ;
     schema:ratingValue ?ratingValue FILTER (?ratingValue = 1) }

Example 2: Requesting claims mentioning journalists. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?text ?date ?journalist ?ratingName ?reviewurl WHERE {
 SERVICE<http://dbpedia.org/sparql> {
  ?journalist a <http://dbpedia.org/class/yago/Journalist110224578> }
  ?claim a schema:CreativeWork ; schema:text ?text  
  OPTIONAL { ?claim schema:datePublished ?date . ?claim schema:author ?author }
  ?claim schema:mentions ?entity . ?entity itsrdf:taIdentRef ?journalist .
  ?claimReview schema:itemReviewed ?claim ; schema:reviewRating ?rating ; schema:url ?reviewurl .
  ?rating schema:author <http://data.gesis.org/claimskg/organization/claimskg> ; schema:alternateName ?ratingName }

Example 3: Requesting the top-3 journalists mentioned in claim reviews of 2018. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT  ?journalist (count(?claimreview) as ?count) WHERE {
  SERVICE<http://dbpedia.org/sparql> {
    ?journalist a <http://dbpedia.org/class/yago/Journalist110224578> }
  ?claimreview a schema:ClaimReview ; schema:datePublished ?date FILTER(year(?date)=2018)
  ?claimreview schema:mentions ?entity . ?entity itsrdf:taIdentRef ?journalist
} GROUP BY ?journalist order by DESC(?count) LIMIT 3

Example 4: Requesting the number of claims per month mentioning Donald Trump in 2018. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT month(?date) as ?month count(?claim) as ?num WHERE {
 ?claim a schema:CreativeWork ; schema:datePublished ?date FILTER(year(?date)=2018)
 ?claim schema:author ?author ; schema:text ?text ; schema:mentions ?entity .
 ?entity itsrdf:taIdentRef dbr:Donald_Trump .
 } GROUP BY month(?date) ORDER BY month(?date)

Example 5: Requesting the top-5 entities mentioned in claims together with “abortion”. (Result)

PREFIX itsrdf: <https://www.w3.org/2005/11/its/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX nee: <http://www.ics.forth.gr/isl/oae/core#>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ?entity2Uri count(?entity2) AS ?num WHERE {
 ?claim a schema:CreativeWork ; schema:text ?text ; schema:mentions ?entity1, ?entity2 .
 ?entity1 itsrdf:taIdentRef dbr:Abortion .
 ?entity2 itsrdf:taIdentRef ?entity2Uri FILTER (?entity2Uri != dbr:Abortion)
} GROUP BY ?entity2Uri ORDER BY DESC(?num) LIMIT 5

Source code

The source code of all components of ClaimsKG is available on GitHub.

https://github.com/claimskg

License

The dataset is published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.

ClaimsKG does NOT contain actual review texts, it only contains structured metadata information, and links to the original review of the fact-checking site. ClaimsKG can be used ONLY for research purposes.

Contact

Please provide your feedback and any comments by sending an email to andon (dot) tchechmedjiev (at) mines-ales (dot) fr

About Us

Stefan Dietze, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Pavlos Fafalios, L3S Research Centre (Germany), https://www.l3s.de/
Malo Gasquet, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Andon Tchechmedjiev, LGI2P / IMT Mines Ales / University of Montpellier (France), https://lgi2p.mines-ales.fr/
Konstantin Todorov, LIRMM / University of Montpellier (France), https://www.lirmm.fr/
Vinicius Woloszyn, TU Berlin (Germany), https://www.tu-berlin.de
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Matthaeus Zloch, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/