GESIS Knowledge Graph (beta)

Overview

The GESIS Knowledge Graph (GESIS KG) represents metadata of scientific resources available in the GESIS Search and its semantic relationships in an integrated and consistent form and makes them accessible for reuse. Understanding relations and dependencies between scientific resources is crucial to capture provenance, ensure reproducibility of research and facilitate informed search across resources. Hence, the GESIS KG contains links between different scientific resources, e.g., links between data, publications, and instruments, and links to entities like authors and social science concepts. The GESIS Knowledge Graph is geared towards interoperability and uses established W3C standards and vocabularies, such as schema.org, DDI, the NFDIcore Ontology among others to increase interoperability and reusability of data on the Web for both humans and machines, e.g., through APIs. On instance-level, we address interoperability by reusing PIDs from commonly used PID systems, interlinking the GESIS KG with other KGs provided by GESIS as well within the NFDI.


User stories

A use case-based development is key for the GESIS Knowledge Graph. Here you can find a selection of user stories illustrating challenges and needs of users and how the GESIS Knowledge Graph can support them.

Maxine

Maxine is a PhD student in the sociology. She is searching for most recent surveys covering questions on migration and publications analysing these datasets. As a newbie, she is uncertain about which academic search portal to use for finding such inter-connected information. The GESIS KG contains links between social science research data and publications. This data is integrated in the GESIS Search portal. Maxine can use the GESIS Search to find the information she needs.

You can find an example SPARQL query here.

Will

Will is a senior researcher in information science. In his research, he is investigating citation behaviour and usage of data in different scientific disciplines across time. To analyse data citation and usage behaviour in the social science domain, Will can query the necessary data from the GESIS KG via its provided SPARQL endpoint. Alternatively, he can download the KG as a dump file to analyse it offline with other preferred tools. The provided documentation of the underlying schema of the GESIS KG helps him to understand the structure of the data.

You can find an example SPARQL query here.

Nancy

Nancy is a research data engineer and works in a research infrastructure organization. In her current project, she needs to integrate metadata from scientific resources from GESIS and other organizations into another search system. For doing so, she needs to access and harvest metadata from the GESIS KG via an OAI-PMH API. The OAI-PMH API of the GESIS KG will allow Nancy to harvest the metadata from GESIS she needs in a standardized format. The webpage of the API will also provide documentation for Nancy on how to use the API and how to harvest the data.



Data

The GESIS Knowledge Graph contains content from the GESIS Search which comprises information about social science research data, publications on research data and open access publications. Detailed information about the content of the GESIS can be found here. GESIS Search aggregates information from different data collections of GESIS. Additionally, the GESIS KG comprises links between scientific resources such as links between research data and publications which are also integrated and available in the GESIS Search.

Detailed provenance information about the original data sources information and the source of the links is reflected in the GESIS KG and is described in the section Provenance.

Resources and entities

The GESIS Knowledge Graph comprises several types of scientific resources, entities and relationships between them. The latest version of the GESIS Knowledge Graph ontology can be found in the Download section.

The figure above illustrates the three main scientific resources which are currently entailed in the GESIS KG. We distinguish between scientific resources and entities. The main resources and entities of the GESIS Knowledge Graph are listed and described below:

Scientific resources
Resource name Class
Dataset schema:Dataset
Publication schema:ScholarlyArticle
Instrument disco:Instrument
Entities in the metadata
Entity name Class
Person schema:Person
Organization schema:Organization
Location schema:Place
Keyword, Concept, Topic schema:DefinedTerm
Groups of resources

Scientific resources may occur as part of a group of resources. The following classes are used in the data model to reflect such groups.

Class
disco:StudyGroup
schema:CreativeWorkSeries
schema:Collection
schema:Periodical

Relationships / links between resources

This section describes the relationships between scientific resources in the GESIS Knowledge Graph and how they are represented in the data model.


Links between resources

Since the GESIS KG holds m:n links between scientific resources and specific data about the links between resources, additional classes determining references and link metadata are included into the data model. In order to enable also a direct link between two resources without details about the link, but for easier querying, we will add direct links in the next release of the data model.

Class Subclass of
gesiskg:DatasetReference gesiskg:Reference
gesiskg:PublicationReference gesiskg:Reference
gesiskg:LinkMetadata gesiskg:ReferenceMetadata
gesiskg:DuplicateMetadata gesiskg:ReferenceMetadata

Link information in the metadata

The links between resources are either manually curated or automatically generated. The manual links are determined by employees of GESIS or are provided via research data bibliographies that are created and curated for specific research data programs. The research data linked in this way are marked as manual and lead to unique research data sets. For the automatically generated links between publications and research data, an internally developed pipeline for dataset citation detection and disambiguation is used which is a further development of the InfoLink tool. This pipeline is used to identify the mention of research data in full texts and then automatically links them to the mentioned research data. The linked research data are marked as automatic and do not necessarily refer to an unique research data set but to possible research data that that is used to create the publication. It is important to mention that the automatically generated links have not yet been evaluated by domain experts.

Metadata name Description Property
Link context Text snippet or annotation marking the reason why a link has been detected gesiskg:linkContext
Link score Computed confidence score of the automatically generated link gesiskg:linkScore
Linking method Specifies whether a link is manually curated, automatically generated, or a search link gesiskg:linkingMethod
Link type Specifies whether a link is a citation or marks a methodological usage of a dataset gesiskg:linkType
Link source Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified gesiskg:linkSource

Provenance

In the following table, it is described how provenance information is reflected in the GESIS KG. In different properties, it is captured from which data source within GESIS Search a particular resource is originating, from where a mentioned dataset is originating, and from which link detection pipeline or manual effort a link is originating as well as versioning information.

Metadata name Description Property
Source info Specifies information about the original data source of a scientific resource gesiskg:sourceInfo
Data source Specifies information about the source of a dataset mentioned in a publication gesiskg:dataSource
Link source Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified gesiskg:linkSource
Version Specifies the versioning information of a resource if available schema:version

Persistent identifiers

Resources and entities in the GESIS Knowledge Graph hold several identifiers. While this includes persistent identifiers like DOIs assigned by PID authorities, there are also identifiers assigned to resources by the authority of the data source. Thirdly, a dereferenceable URI within the namespace of the GESIS Knowledge Graph has been assigned to every resource and entity which is part of the graph. The table below gives an overview of all identifiers which are present in the GESIS Knowledge Graph.

Identifier Description
DOI Digital Object Identifier
URN Uniform Resource Name
ORCID Open Researcher and Contributor ID
ISSN International Standard Serial Number
ISBN International Standard Book Number
Internal GESIS ID Internal ID used within GESIS
GESIS Study Number Number used for research data archived at GESIS
GESIS KG URI Uniform Resource Identifier defined for the GESIS KG, reusing the Internal GESIS ID

URI paths
The following URI paths are used by the GESIS KG to locate, e.g., the schema elements and the resources within the KG.
  • Base URI: https://data.gesis.org/gesiskg/
  • Schema URI: https://data.gesis.org/gesiskg/schema/
  • Resource URI: https://data.gesis.org/gesiskg/resource/
  • GESISKG metadata: https://data.gesis.org/gesiskg/id/1

The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.

Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280

Dataset statistics

General statistics
Total number of RDF triples 15003783
Total number of scientific resources 474201
Publications 467125
Datasets 6733
Instruments 343
Persons 403178
Organizations 9691
Locations 30885
Keywords, Concepts, Topics 12951
Schema statistics
Total number of classes 31
Reused classes 26
New defined classes 5
Total number of object properties 33
Reused object properties 20
New defined object properties 13
Total number of datatype properties 94
Reused datatype properties 28
New defined datatype properties 66
Link statistics
Total number of links 168362
Automatically generated links 99227
Manually curated links 47309
Links between publications and datasets 162671
Links between publications and instruments 5817
Links between datasets and instruments 74


Access & APIs

The GESIS Knowledge Graph is available through various access points: via public APIs, as download, and integrated into the GESIS Search portal.

APIs

OAI-PMH API (tba)

We will provide an OAI-PMH API which will allow you to access and harvest metadata provided by the GESIS KG in the oai_datacite and berd format. This API will come with a dedicated website and documentation.

SPARQL endopint (beta)

You can explore the data within the GESIS Knowledge Graph using SPARQL queries at the following SPARQL endpoint: https://data.gesis.org/gesiskg/sparql
Below you can find some example SPARQL queries.

The following query lists all publications which are included in the GESIS KG (up to a limit of 10000 resources). (Result)

SELECT ?id ?title
WHERE {?id ?p <https://schema.org/ScholarlyArticle>.
       ?id <https://schema.org/name> ?title.
} 
LIMIT 10000

To retrieve resources from a different type, change <https://schema.org/ScholarlyArticle> accordingly to <https://schema.org/Dataset> or <https://rdf-vocabulary.ddialliance.org/discovery.html%23dfn-disco-instrument>.

The following query lists all information which is available for a particular resource in the GESIS KG. (Result)

SELECT *
WHERE {<https://data.gesis.org/gesiskg/resource/ZA5282> ?p ?o}

The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.

Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280

The following query Lists all datasets and the publications that cite them for the topic "Migration" (reflecting the user story of Maxine). (Result)

SELECT ?publication ?publication_title ?dataset ?dataset_title
WHERE {?publication <https://schema.org/about> ?topic.  
       ?topic <https://schema.org/name> "Migration"@en.
       ?publication ?p <https://schema.org/ScholarlyArticle>.  
       ?publication <https://data.gesis.org/gesiskg/schema/reference> ?r.  
       ?r ?p1 <https://data.gesis.org/gesiskg/schema/DatasetReference>. 
       ?r <https://data.gesis.org/gesiskg/schema/referenceMetadata> ?m. 
       ?m ?p2 <https://data.gesis.org/gesiskg/schema/LinkMetadata>. 
       ?m <https://schema.org/mainEntity> ?dataset.
       ?dataset ?p3 <https://schema.org/Dataset>.
       ?publication <https://schema.org/name> ?publication_title.
       ?dataset <https://schema.org/name> ?dataset_title.
}

This query can easily be adjusted and explored by changing the string and language tag in line 3 from "Migration"@en to, e.g., "Germany"@en, "Gesundheit"@de, or "Politik"@de. Please note that the retrieved results depend on whether resources have been originally indexed with German or English keywords or in both languages.

The following query retrieves a year-wise count of publications citing datasets focusing on the topic "Migration" (reflecting the user story of Will). (Result)

SELECT ?year (COUNT(?publication) AS ?count)
WHERE {?publication ?p <https://schema.org/ScholarlyArticle>.
       ?publication <https://data.gesis.org/gesiskg/schema/reference> ?r.  
       ?r ?p1 <https://data.gesis.org/gesiskg/schema/DatasetReference>. 
       ?r <https://data.gesis.org/gesiskg/schema/referenceMetadata> ?m. 
       ?m ?p2 <https://data.gesis.org/gesiskg/schema/LinkMetadata>. 
       ?m <https://schema.org/mainEntity> ?dataset.
       ?dataset ?p3 <https://schema.org/Dataset>. 
       ?dataset <https://schema.org/about> ?topic.  
       ?topic <https://schema.org/name> "Migration"@en.
       ?publication <https://schema.org/datePublished> ?year .
} 
GROUP BY ?year
ORDER BY ?year

Download

You can download the GESIS Knowledge Graph as a full RDF dump as well as its underlying ontology at Zenodo:
https://doi.org/10.5281/zenodo.14229945

GESIS Search

The GESIS Knowledge Graph is integrated in the GESIS Search. Links between scientific resources are included in the result list and detailed views of search results.

License

The GESIS Knowledge Graph is available for access, download, and reuse under a Creative Commons Attribution 4.0 license since the license of some input sources is CC-BY as well.
If you are using the GESIS Knowledge Graph or parts from it, please cite the GESIS KG as follows:

Biswas, D., & Zapilko, B. (2024). GESIS Knowledge Graph (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14229945

Versions and changelog

For versioning, we follow the Semantic Versioning specification. On a version scheme MAJOR.MINOR.PATCH, we increment the
  • MAJOR version when the data model has been changed or updated
  • MINOR version when data sources have been changed or updated, new data sources have been integrated, or underlying information extraction pipelines have been updated or included
  • PATCH version when the data in the graph has been updated

Changelog

This section documents all changes for each version of the GESIS Knowledge Graph.

v1.0.0 - 04.12.2024
  • Beta release of the GESIS Knowledge Graph


Dissemination

Publications

  • Hienert, Daniel, Dagmar Kern, Katarina Boland, Benjamin Zapilko, and Peter Mutschke. 2019. "A digital library for research data and related information in the social sciences." In Proceedings of 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 148-157. Piscataway, NJ: IEEE. doi: https://doi.org/10.1109/JCDL.2019.00030.
  • Zapilko, Benjamin, Katarina Boland, and Dagmar Kern. 2018. "A LOD backend infrastructure for scientific search portals." In The Semantic Web. 15th Extended Semantic Web Conference (ESWC) - Proceedings, 729-744. Cham: Springer International Publishing. doi: https://doi.org/10.1007/978-3-319-93417-4_47.

Presentations

  • Zapilko, Benjamin, and Debanjali Biswas. 2024. "How knowledge graphs can help you to share research data and information." Meet the Experts Season 6 - Knowledge technologies for the Social Science: Access to Social Science Data and Services, GESIS - Leibniz-Institut für Sozialwissenschaften, Köln, 2024-04-11. Slides


About Us

Lead / Contact

Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/

Team

Debanjali Biswas, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Daniel Hienert, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Dagmar Kern, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Yudong Zhang, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/

Knowledge Technologies for the Social Sciences: https://www.gesis.org/en/institute/about-us/departments/knowledge-technologies-for-the-social-sciences
GESIS Knowledge Graph Infrastructure: https://www.gesis.org/en/research/research-area-computational-methods/knowledge-graph-infrastructure