GESIS Knowledge Graph

Overview

The GESIS Knowledge Graph (GESIS KG) represents metadata of scientific resources available in the GESIS Search and its semantic relationships in an integrated and consistent form and makes them accessible for reuse. Understanding relations and dependencies between scientific resources is crucial to capture provenance, ensure reproducibility of research and facilitate informed search across resources. Hence, the GESIS KG contains links between different scientific resources, e.g., links between datasets, publications, variables and instruments, and links to entities like authors, organizations and social science concepts. The GESIS Knowledge Graph is geared towards interoperability and uses established standards and vocabularies, such as schema.org, DDI, the NFDIcore Ontology among others to increase interoperability and reusability of data on the Web for both humans and machines, e.g., through APIs. On instance-level, we address interoperability by reusing PIDs from commonly used PID systems, interlinking the GESIS KG with other KGs provided by GESIS as well within the NFDI.


User stories

A use case-based development is key for the GESIS Knowledge Graph. Here you can find a selection of user stories illustrating challenges and needs of users and how the GESIS Knowledge Graph can support them.

Maxine

Maxine is a PhD student in the sociology. She is searching for most recent surveys covering questions on migration and publications analysing these datasets. As a newbie, she is uncertain about which academic search portal to use for finding such inter-connected information. The GESIS KG contains links between social science research data and publications. This data is integrated in the GESIS Search portal. Maxine can use the GESIS Search to find the information she needs.

You can find an example SPARQL query here.

Will

Will is a senior researcher in information science. In his research, he is investigating citation behaviour and usage of data in different scientific disciplines across time. To analyse data citation and usage behaviour in the social science domain, Will can query the necessary data from the GESIS KG via its provided SPARQL endpoint. Alternatively, he can download the KG as a dump file to analyse it offline with other preferred tools. The provided documentation of the underlying schema of the GESIS KG helps him to understand the structure of the data.

You can find an example SPARQL query here.

Nancy

Nancy is a research data engineer and works in a research infrastructure organization. In her current project, she needs to integrate metadata from scientific resources from GESIS and other organizations into another search system. For doing so, she needs to access and harvest metadata from the GESIS KG via an OAI-PMH API. The OAI-PMH API of the GESIS KG will allow Nancy to harvest the metadata from GESIS she needs in a standardized format. The webpage of the API will also provide documentation for Nancy on how to use the API and how to harvest the data.



Data

The GESIS Knowledge Graph contains content from the GESIS Search which comprises information about social science research data, publications on research data and open access publications. Detailed information about the content of the GESIS Search can be found here. GESIS Search aggregates information from different data collections of GESIS. Additionally, the GESIS KG comprises links between scientific resources such as links between research data and publications, publications and instruments, and so on which are also integrated and available in the GESIS Search.

Detailed provenance information about the original data sources information and the source of the links is reflected in the GESIS KG and is described in the section Provenance.

Resources and entities

The GESIS Knowledge Graph comprises several types of scientific resources, entities and the direct relationships without provenance between them. The latest version of the GESIS Knowledge Graph ontology can be found in the Download section.

The figure above illustrates the four main scientific resources which are currently entailed in the GESIS KG. We distinguish between scientific resources and entities. The main resources and entities of the GESIS Knowledge Graph are listed and described below:

Scientific resources
Resource name Class
Dataset schema:Dataset
Publication schema:ScholarlyArticle
Instrument ddi:Instrument
Variable ddi:Variable
Entities in the metadata
Entity name Class
Person schema:Person
Organization schema:Organization
Location schema:Place
Keyword, Concept, Topic schema:DefinedTerm
Groups of resources

Scientific resources may occur as part of a group of resources. The following classes are used in the data model to reflect such groups.

Class
schema:DataCatalog
schema:CreativeWorkSeries
schema:Collection
schema:Periodical

Relationships / links between resources

This section describes the relationships between scientific resources in the GESIS Knowledge Graph and how they are represented in the data model.


Links between resources

The figure above depicts indirect relationships with provenance. In the GESIS KG Ontology, we provide both direct relationships without provenance and indirect relationships with provenance—the former for easy querying, and the latter for capturing additional metadata about how the links were created. Since the GESIS KG represents many-to-many (m:n) links between scientific resources and includes specific data about these links, additional classes for references and link metadata are incorporated into the data model.

Class Subclass of
gesiskg:DatasetReference gesiskg:Reference
gesiskg:PublicationReference gesiskg:Reference
gesiskg:VariableReference gesiskg:Reference
gesiskg:LinkMetadata gesiskg:ReferenceMetadata
gesiskg:DuplicateMetadata gesiskg:ReferenceMetadata

Link information in the metadata

Links between resources in the GESIS KG are either manually curated or automatically generated. Manual links are created by GESIS staff or are derived from research data bibliographies curated for specific research data programs. These manually linked resources are clearly marked as such and typically point to unique research datasets. The curation of these links is handled by the Manual Link Curation Pipeline. For automatically generated links between publications and research data, GESIS has developed the Dataset Citation Detection Pipeline. This pipeline, an extension of the InfoLink tool, is designed to detect and disambiguate dataset citations. It identifies mentions of research data within full texts and automatically links them to the corresponding datasets. These links are marked as automatic and may not always refer to a single, unique dataset, but rather to potential datasets used in the publication. It is important to note that the automatically generated links have not yet been evaluated by domain experts—this is planned as future work. Another pipeline, the Variable Detection Pipeline, automatically identifies links between publications and variables. This was developed as part of the DFG-funded VADIS project. Additionally, the Publication Citation Detection Pipeline identifies citation links between publications and their referenced works automatically, developed under the DFG-funded Outcite project.

Metadata name Description Property
Link context Text snippet or annotation marking the reason why a link has been detected gesiskg:linkContext
Link score Computed confidence score of the automatically generated link gesiskg:linkScore
Linking method Specifies whether a link is manually curated, automatically generated, or a search link gesiskg:linkingMethod
Link type Specifies whether a link is a citation or marks a methodological usage of a dataset gesiskg:linkType
Link source Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified gesiskg:linkSource

Provenance

In the following table, it is described how provenance information is reflected in the GESIS KG. In different properties, it is captured from which data source within GESIS Search a particular resource is originating, from where a mentioned dataset is originating, and from which link detection pipeline or manual effort a link is originating as well as versioning information.

Metadata name Description Property
Source info Specifies information about the original data source of a scientific resource gesiskg:sourceInfo
Data source Specifies information about the source of a dataset mentioned in a publication gesiskg:dataSource
Link source Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified gesiskg:linkSource
Version Specifies the versioning information of a resource if available schema:version

Persistent identifiers

Resources and entities in the GESIS Knowledge Graph hold several identifiers. While this includes persistent identifiers like DOIs assigned by PID authorities, there are also identifiers assigned to resources by the authority of the data source. Thirdly, a dereferenceable URI within the namespace of the GESIS Knowledge Graph has been assigned to every resource and entity which is part of the graph. The table below gives an overview of all identifiers which are present in the GESIS Knowledge Graph.

Identifier Description
DOI Digital Object Identifier
URN Uniform Resource Name
ORCID Open Researcher and Contributor ID
ISSN International Standard Serial Number
ISBN International Standard Book Number
Internal GESIS ID Internal ID used within GESIS
GESIS Study Number Number used for research data archived at GESIS
GESIS KG URI Uniform Resource Identifier defined for the GESIS KG, reusing the Internal GESIS ID

URI paths
The following URI paths are used by the GESIS KG to locate, e.g., the schema elements and the resources within the KG.
  • Base URI: https://data.gesis.org/gesiskg/
  • Schema URI: https://data.gesis.org/gesiskg/schema/
  • Resource URI: https://data.gesis.org/gesiskg/resource/
  • GESISKG metadata: https://data.gesis.org/gesiskg/id/1

The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.

Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280

Dataset statistics

General statistics
Total number of RDF triples 97133374
Total number of scientific resources 1986662
Publications 583085
Datasets 7546
Instruments 532
Variables 1395499
Persons 466265
Organizations 20533
Locations 28314
Keywords, Concepts, Topics 18297
Schema statistics
Total number of classes 33
Reused classes 26
New defined classes 7
Total number of object properties 34
Reused object properties 19
New defined object properties 15
Total number of datatype properties 114
Reused datatype properties 30
New defined datatype properties 84
Link statistics
Total number of links 1861964
Automatically generated links between publicatiosn and datasets 78733
Manually curated links between publications and datasets 49509
Links between publications and datasets 145382
Links between publications and instruments 5813
Links between datasets and instruments 74
Links between datasets and variables 1393842
Links between publications 313899
Links between publications and variables 2954


Access & APIs

The GESIS Knowledge Graph is available through various access points: via public APIs, as download, and integrated into the GESIS Search portal.

APIs

OAI-PMH API

We provide an OAI-PMH API, available at https://data.gesis.org/gesiskg/oai/, which allows you to access and harvest metadata provided by the GESIS KG in the DataCite and OpenAIRE format.

SPARQL endopint

You can explore the data within the GESIS Knowledge Graph using SPARQL queries at the following SPARQL endpoint: https://data.gesis.org/gesiskg/sparql
Below you can find some example SPARQL queries.

The following query lists all publications which are included in the GESIS KG (up to a limit of 10000 resources). (Result)

SELECT ?id ?title
WHERE {?id ?p <https://schema.org/ScholarlyArticle>.
       ?id <https://schema.org/name> ?title.
} 
LIMIT 10000

To retrieve resources from a different type, change <https://schema.org/ScholarlyArticle> accordingly to <https://schema.org/Dataset>, <http://rdf-vocabulary.ddialliance.org/lifecycle#Variable> or <http://rdf-vocabulary.ddialliance.org/lifecycle#Instrument>.

The following query lists all information which is available for a particular resource in the GESIS KG. (Result)

SELECT *
WHERE {<https://data.gesis.org/gesiskg/resource/ZA5282> ?p ?o}

The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.

Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280

The following query Lists 100 datasets and the publications that cite them for the topic "Migration" (reflecting the user story of Maxine). (Result)

SELECT ?publication ?publication_title ?dataset ?dataset_title
WHERE {?publication <https://schema.org/about> ?topic.  
       ?topic <https://schema.org/name> "Migration"@en.
       ?publication ?p <https://schema.org/ScholarlyArticle>.  
       ?publication <https://schema.org/citation> ?dataset.
       ?dataset ?p <https://schema.org/Dataset>.
       ?publication <https://schema.org/name> ?publication_title.
       ?dataset <https://schema.org/name> ?dataset_title.
}Limit 100

This query can easily be adjusted and explored by changing the string and language tag in line 3 from "Migration"@en to, e.g., "Germany"@en, "Gesundheit"@de, or "Politik"@de. Please note that the retrieved results depend on whether resources have been originally indexed with German or English keywords or in both languages.

The following query retrieves a year-wise count of publications citing datasets focusing on the topic "Migration" (reflecting the user story of Will). (Result)

SELECT ?year (COUNT(?publication) AS ?count)
WHERE {?publication ?p <https://schema.org/ScholarlyArticle>.
       ?publication <https://schema.org/citation> ?dataset.
       ?dataset ?p <https://schema.org/Dataset>. 
       ?dataset <https://schema.org/about> ?topic.  
       ?topic <https://schema.org/name> "Migration"@en.
       ?publication <https://schema.org/datePublished> ?year .
} 
GROUP BY ?year
ORDER BY ?year

Download

You can download the current version of the GESIS Knowledge Graph as a full RDF dump (JSON-LD and Turtle) as well as its underlying ontology at: https://doi.org/10.7802/2878

Older version:
v0.1.0-beta: https://doi.org/10.5281/zenodo.14229945

GESIS Search

The GESIS Knowledge Graph is integrated in the GESIS Search. Links between scientific resources are included in the result list and detailed views of search results.

License

The GESIS Knowledge Graph is available for access, download, and reuse under a Creative Commons Attribution 4.0 license since the license of some input sources is CC-BY as well.
If you are using the GESIS Knowledge Graph or parts from it, please cite the GESIS KG as follows:

Biswas, Debanjali, & Zapilko, Benjamin (2025). GESIS Knowledge Graph (GESIS KG). GESIS, Köln. Datenfile Version 1.0.0, https://doi.org/10.7802/2878.

Versions and changelog

For versioning, we follow the Semantic Versioning specification. On a version scheme MAJOR.MINOR.PATCH, we increment the
  • MAJOR version when the data model has been substantially extended
  • MINOR version when data sources or data model have been updated, new data sources have been integrated, or underlying information extraction pipelines have been updated or included
  • PATCH version when the data in the graph has been updated

Changelog

This section documents all changes for each version of the GESIS Knowledge Graph.

v1.0.0 - 12.05.2025
  • Latest release of the GESIS Knowledge Graph
v0.1.0-beta - 04.12.2024
  • Beta release of the GESIS Knowledge Graph


Dissemination

Publications

  • Hienert, Daniel, Dagmar Kern, Katarina Boland, Benjamin Zapilko, and Peter Mutschke. 2019. "A digital library for research data and related information in the social sciences." In Proceedings of 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 148-157. Piscataway, NJ: IEEE. doi: https://doi.org/10.1109/JCDL.2019.00030.
  • Zapilko, Benjamin, Katarina Boland, and Dagmar Kern. 2018. "A LOD backend infrastructure for scientific search portals." In The Semantic Web. 15th Extended Semantic Web Conference (ESWC) - Proceedings, 729-744. Cham: Springer International Publishing. doi: https://doi.org/10.1007/978-3-319-93417-4_47.

Presentations

  • Zapilko, Benjamin, and Debanjali Biswas. 2024. "How knowledge graphs can help you to share research data and information." Meet the Experts Season 6 - Knowledge technologies for the Social Science: Access to Social Science Data and Services, GESIS - Leibniz-Institut für Sozialwissenschaften, Köln, 2024-04-11. Slides


About Us

Lead / Contact

Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/

Team

Debanjali Biswas, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Daniel Hienert, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Dagmar Kern, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Yudong Zhang, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/

Knowledge Technologies for the Social Sciences: https://www.gesis.org/en/institute/about-us/departments/knowledge-technologies-for-the-social-sciences