GESIS Knowledge Graph

About

The GESIS Knowledge Graph (GESIS KG) represents metadata of scientific resources available in the GESIS Search and its semantic relationships in an integrated and consistent form and makes them accessible for reuse. Understanding relations and dependencies between scientific resources is crucial to capture provenance, ensure reproducibility of research and facilitate informed search across resources. Hence, the GESIS KG contains links between different scientific resources, e.g., links between datasets, publications, variables and instruments, and links to entities like authors, organizations and social science concepts. The GESIS Knowledge Graph is geared towards interoperability and uses established standards and vocabularies, such as schema.org, DDI, the NFDIcore Ontology among others to increase interoperability and reusability of data on the Web for both humans and machines, e.g., through APIs. On instance-level, we address interoperability by reusing PIDs from commonly used PID systems, interlinking the GESIS KG with other KGs provided by GESIS as well within the NFDI.


User stories

A use case-based development is key for the GESIS Knowledge Graph. Here you can find a selection of user stories illustrating challenges and needs of users and how the GESIS Knowledge Graph can support them.

Maxine

Maxine is a PhD student in sociology. She is searching for most recent surveys covering questions on migration and publications analysing these datasets. As a newbie, she is uncertain about which academic search portal to use for finding such inter-connected information. The GESIS KG contains links between social science research data and publications. This data is integrated in the GESIS Search portal. Maxine can use the GESIS Search to find the information she needs.

You can find an example SPARQL query here.

Will

Will is a senior researcher in information science. In his research, he is investigating citation behaviour and usage of data in different scientific disciplines across time. To analyse data citation and usage behaviour in the social science domain, Will can query the necessary data from the GESIS KG via its provided SPARQL endpoint. Alternatively, he can download the KG to analyse it offline with other preferred tools. The provided documentation of the data model helps him to understand the structure of the data.

You can find an example SPARQL query here.

Nancy

Nancy is a research data engineer and works in a research infrastructure organization. In her current project, she needs to integrate metadata from scientific resources from GESIS and other organizations into another search system. For doing so, she needs to access and harvest metadata from the GESIS KG via an OAI-PMH API. The OAI-PMH API of the GESIS KG will allow Nancy to harvest the metadata from GESIS she needs in a standardized format. The webpage of the API provides also a documentation on how to use the API and how to harvest the data.



Data

Data Sources

The GESIS Knowledge Graph contains content from the GESIS Search which comprises information about social science research data, publications on research data and open access publications. Detailed information about the content of the GESIS Search can be found here. GESIS Search aggregates information from different data collections of GESIS. Additionally, the GESIS KG comprises links between scientific resources such as links between research data and publications, publications and instruments, and so on which are also integrated and available in the GESIS Search.

Data repositories for different Research resource type
Resource type Data repositories
Dataset GESIS-Data Archive and GESIS-MISSY
Publication SSOAR, GESIS Library, GESIS Research Data Bibliography, GESIS-GRIS, GESIS-CEWS and GESIS-ZIS
Instrument GESIS-ZIS, GESIS-Pretest, GESIS-Guides and GESIS-MISSY
Variable GESIS-Colectica, GESIS-Data Archive and GESIS-MISSY
Method and Tutorial GESIS-Methods Hub

Detailed provenance information about the original data sources and the source of the links is reflected in the GESIS KG and is described in the section Provenance.


Data model

The GESIS Knowledge Graph comprises several types of scientific resources types and the links between them. The complete ontology can be found here.

The figure above illustrates the six main scientific resources types which are currently entailed in the GESIS KG. The main resources types of the GESIS Knowledge Graph are listed and described below:

Scientific resources types
Name Class
Dataset schema:Dataset
Publication schema:ScholarlyArticle
Instrument ddi:Instrument
Variable ddi:Variable
Method schema:SoftwareSourceCode
Tutorial gesiskg:Tutorial

Additional types in the metadata
Name Class
Person schema:Person
Organization schema:Organization
Location schema:Place
Keyword, Concept, Topic schema:DefinedTerm
Task gessikg:Task

Links between resources

This section describes the relationships between scientific resources in the GESIS Knowledge Graph and how they are represented in the data model.


Links between resources

The figure above depicts links between different resource types with provenance. In the GESIS KG Ontology, we provide both links without provenance and links with provenance — the former for easy querying, and the latter for capturing additional metadata about how the links were created. Since the GESIS KG represents many-to-many (m:n) links between scientific resources and includes specific data about these links, additional classes for references and link metadata are incorporated into the data model.

Class Subclass of
gesiskg:DatasetReference gesiskg:Reference
gesiskg:PublicationReference gesiskg:Reference
gesiskg:VariableReference gesiskg:Reference
gesiskg:LinkMetadata gesiskg:ReferenceMetadata
gesiskg:DuplicateMetadata gesiskg:ReferenceMetadata

Link information in the metadata

Links between resources in the GESIS KG are either manually curated or automatically generated. Manual links are created by GESIS staff or are derived from research data bibliographies curated for specific research data programs. These manually linked resources are clearly marked as such and typically point to unique research datasets. The curation of these links is handled by the Manual Link Curation Pipeline. For automatically generated links between publications and research data, GESIS has developed the Dataset Citation Detection Pipeline. This pipeline, originatin from the DFG-funded InFoLiS project, is designed to detect and disambiguate dataset citations. It identifies mentions of research data within full texts and automatically links them to the corresponding datasets. These links are marked as automatic and may not always refer to a single, unique dataset, but rather to potential datasets used in the publication. It is important to note that the automatically generated links have not yet been evaluated by domain experts—this is planned as future work. Another pipeline, the Variable Detection Pipeline, automatically identifies links between publications and variables. This was developed as part of the DFG-funded VADIS project. Additionally, the Publication Citation Detection Pipeline identifies citation links between publications and their referenced works automatically, developed under the DFG-funded Outcite project.

Metadata name Description Property
Link context Text snippet or annotation marking the reason why a link has been detected gesiskg:linkContext
Link score Computed confidence score of the automatically generated link gesiskg:linkScore
Linking method Specifies whether a link is manually curated, automatically generated, or a search link gesiskg:linkingMethod
Link type Specifies whether a link is a citation or marks a methodological usage of a dataset gesiskg:linkType
Link source Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified gesiskg:linkSource

Provenance

In the following table, it is described how provenance information is reflected in the GESIS KG. In different properties, it is captured from which data source within GESIS Search a particular resource is originating, from where a mentioned dataset is originating, and from which link detection pipeline or manual effort a link is originating as well as versioning information.

Metadata name Description Property
Source info Specifies information about the original data source of a scientific resource gesiskg:sourceInfo
Data source Specifies information about the source of a dataset mentioned in a publication gesiskg:dataSource
Link source Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified gesiskg:linkSource
Version Specifies the versioning information of a resource if available schema:version

Persistent identifiers

Resources and entities in the GESIS Knowledge Graph hold several identifiers. While this includes persistent identifiers like DOIs assigned by PID authorities, there are also identifiers assigned to resources by the authority of the data source. Thirdly, a dereferenceable URI within the namespace of the GESIS Knowledge Graph has been assigned to every resource and entity which is part of the graph. The table below gives an overview of all identifiers which are present in the GESIS Knowledge Graph.

Identifier Description
DOI Digital Object Identifier
URN Uniform Resource Name
ORCID Open Researcher and Contributor ID
ISSN International Standard Serial Number
ISBN International Standard Book Number
Internal GESIS ID Internal ID used within GESIS
GESIS Study Number Number used for research data archived at GESIS
GESIS KG URI Uniform Resource Identifier defined for the GESIS KG, reusing the Internal GESIS ID

URI paths
The following URI paths are used by the GESIS KG to locate, e.g., the schema elements and the resources within the KG.
  • Base URI: https://data.gesis.org/gesiskg/
  • Schema URI: https://data.gesis.org/gesiskg/schema/
  • Resource URI: https://data.gesis.org/gesiskg/resource/
  • GESIS KG metadata: https://data.gesis.org/gesiskg/id/1

The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.

Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280


Dataset statistics

General statistics
Category Count
Total number of RDF triples 97,528,626
Types
Publications 583,079
Datasets 7,553
Instruments 532
Variables 1,407,148
Methods 22
Tutorials 13
Persons 466,306
Organizations 20,533
Locations 28,314
Keywords, Concepts, Topics 20,863
Tasks 20

Schema statistics
Category Count
Types
Reused Types 16
New Defined Types 9
Total 25
Object Properties
Reused Object Properties 10
New Defined Object Properties 15
Total 25
Data Properties
Reused Data Properties 35
New Defined Data Properties 81
Total 116

Link statistics
Link Automatic Manual Total
Survey Instruments to Research Datasets 0 156 156
Survey Instruments to Publications 0 5,916 5,916
Survey Variables to Research Datasets 0 1,398,890 1,398,890
Survey Variables to Survey Variables 0 288,652 288,652
Publications to Publications 676,241 0 676,241
citation 313,899 0 313,899
duplicates 362,342 0 362,342
Publications to Survey Variables 2,954 0 2,954
Publications to Research Datasets 78,733 49,509 145,382
Methods to Publications 0 9 9
Methods to Tutorials 0 2 2
Tutorials to Methods 0 2 2


Access & APIs

The GESIS Knowledge Graph is available through various access points: via public APIs, as download, and integrated into the GESIS Search portal.

APIs

OAI-PMH API

We provide an OAI-PMH API, available at https://data.gesis.org/gesiskg/oai/, which allows you to access and harvest metadata provided by the GESIS KG in the DataCite and OpenAIRE format.

SPARQL endopint

You can explore the data within the GESIS Knowledge Graph using SPARQL queries at the following SPARQL endpoint: https://data.gesis.org/gesiskg/sparql
Below you can find some example SPARQL queries.

The following query lists all publications which are included in the GESIS KG (up to a limit of 10000 resources). (Result)

PREFIX schema: <https://schema.org/>
SELECT ?id ?title
WHERE {?id a schema:ScholarlyArticle.
       ?id schema:name ?title.
} 
LIMIT 10000

To retrieve resources from a different type, change <https://schema.org/ScholarlyArticle> accordingly to <https://schema.org/Dataset>, <http://rdf-vocabulary.ddialliance.org/lifecycle#Variable> or <http://rdf-vocabulary.ddialliance.org/lifecycle#Instrument>.

The following query lists all information which is available for a particular resource in the GESIS KG. (Result)

SELECT *
WHERE {<https://data.gesis.org/gesiskg/resource/ZA5282> ?p ?o}

The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.

Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280

The following query Lists 100 datasets and the publications that cite them for the topic "Migration" (reflecting the user story of Maxine). (Result)

PREFIX schema: <https://schema.org/>
SELECT ?publication ?publication_title 
    ?dataset ?dataset_title
WHERE {
    ?publication schema:about ?topic.  
    ?topic schema:name "Migration"@en.
    ?publication a schema:ScholarlyArticle.
    ?publication schema:citation ?dataset. 
    ?dataset a schema:Dataset.
    ?publication schema:name ?publication_title.
    ?dataset schema:name ?dataset_title.
} LIMIT 1000

This query can easily be adjusted and explored by changing the string and language tag in line 3 from "Migration"@en to, e.g., "Germany"@en, "Gesundheit"@de, or "Politik"@de. Please note that the retrieved results depend on whether resources have been originally indexed with German or English keywords or in both languages.

The following query retrieves a year-wise count of publications citing datasets focusing on the topic "Migration" (reflecting the user story of Will). (Result)

PREFIX schema: <https://schema.org/>
SELECT ?year (COUNT(?publication) AS ?count)
WHERE {
    ?publication a schema:ScholarlyArticle.
    ?publication schema:citation ?dataset.
    ?dataset a schema:Dataset. 
    ?dataset schema:about ?topic.  
    ?topic schema:name "Migration"@en.
    ?publication schema:datePublished ?year.
} GROUP BY ?year ORDER BY ?year

The following query retrieves publications which contain variable mentions from a specific study group. (Result)

PREFIX schema: <https://schema.org/>
PREFIX gesiskg: <https://data.gesis.org/gesiskg/schema/>
PREFIX ddi: <http://rdf-vocabulary.ddialliance.org/lifecycle#>
SELECT *
WHERE {
    ?publication a schema:ScholarlyArticle.
    ?publication schema:name ?publication_title .
    ?variable a ddi:Variable.
    ?publication schema:citation ?variable.
    ?variable schema:name ?variable_title .
    ?variable gesiskg:studyGroup "ALLBUS"@en
}

Download

You can download the current version of the GESIS Knowledge Graph as a full RDF dump (Turtle format) as well as its underlying GESIS KG ontology at: https://doi.org/10.7802/2969

Older version:
v1.0.0: https://doi.org/10.7802/2878
v0.1.0-beta: https://doi.org/10.5281/zenodo.14229945


GESIS Search

The GESIS Knowledge Graph is integrated in the GESIS Search. Links between scientific resources are included in the result list and detailed views of search results.


Source Code

The source code of the GESIS KG construction pipeline is publicly available on Github. However, full reproducibility of the workflow is not possible, as the input data source, i.e., GESIS Search Elastic Search index, required for the construction process is not publicly accessible.


License

The GESIS Knowledge Graph is available for access, download, and reuse under a Creative Commons Attribution 4.0 license since the license of some input sources is CC-BY as well.
If you are using the GESIS Knowledge Graph or parts from it, please cite the GESIS KG as follows:

Biswas, D., Gupta, E., Yu, R., & Zapilko, B. (2025). GESIS Knowledge Graph (GESIS KG) (Version 2.0.0) [Data set]. GESIS, Cologne. https://doi.org/10.7802/2969.



Dissemination

Publications

  • Hienert, Daniel, Dagmar Kern, Katarina Boland, Benjamin Zapilko, and Peter Mutschke. 2019. "A digital library for research data and related information in the social sciences." In Proceedings of 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 148-157. Piscataway, NJ: IEEE. doi: https://doi.org/10.1109/JCDL.2019.00030.
  • Zapilko, Benjamin, Katarina Boland, and Dagmar Kern. 2018. "A LOD backend infrastructure for scientific search portals." In The Semantic Web. 15th Extended Semantic Web Conference (ESWC) - Proceedings, 729-744. Cham: Springer International Publishing. doi: https://doi.org/10.1007/978-3-319-93417-4_47.

Presentations

  • Zapilko, Benjamin, Debanjali Biswas, Yudong Zhang, and Peter Mutschke. 2025. "GESIS Knowledge Graph – A blueprint for connecting a domain-specific knowledge graph to international research infrastructures." RDA 25th Plenary (RDA P25) co-located with International Data Week 2025 (IDW 2025), 2025-10-13. doi: https://doi.org/10.5281/zenodo.17867453.
  • Zapilko, Benjamin, Debanjali Biswas, Yudong Zhang, and Peter Mutschke. 2025. "GESIS Knowledge Graph: Connecting a domain-specific KG to the NFDI and beyond." In 2nd Conference on Research Data Infrastructure (CoRDI), edited by York Sure-Vetter, and Paul Groth, doi: https://doi.org/10.5281/zenodo.16736066.
  • Zapilko, Benjamin, and Debanjali Biswas. 2024. "How knowledge graphs can help you to share research data and information." Meet the Experts Season 6 - Knowledge technologies for the Social Science: Access to Social Science Data and Services, GESIS - Leibniz-Institut für Sozialwissenschaften, Köln, 2024-04-11. Slides


Team & Contact

Lead / Contact

Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/

Team

(in alphabetical order)

Debanjali Biswas, Heinrich Heine Universität Düsseldorf (HHU), https://www.cs.hhu.de/lehrstuehle-und-arbeitsgruppen/data-knowledge-engineering/unser-team
Stefan Dietze, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/ and Heinrich Heine Universität Düsseldorf (HHU), https://www.cs.hhu.de/lehrstuehle-und-arbeitsgruppen/data-knowledge-engineering/unser-team
Endri Gupta, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Daniel Hienert, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Dagmar Kern, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Peter Mutschke, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Ran Yu, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Yudong Zhang, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/

More information

Knowledge Technologies for the Social Sciences: https://www.gesis.org/en/institute/about-us/departments/knowledge-technologies-for-the-social-sciences
AI-assisted Linking: https://www.gesis.org/en/research/research-area-computational-methods/ai-assisted-linking