Maxine is a PhD student in the sociology. She is searching for most recent surveys covering questions on migration and publications analysing these datasets. As a newbie, she is uncertain about which academic search portal to use for finding such inter-connected information. The GESIS KG contains links between social science research data and publications. This data is integrated in the GESIS Search portal. Maxine can use the GESIS Search to find the information she needs.
You can find an example SPARQL query here.
Will is a senior researcher in information science. In his research, he is investigating citation behaviour and usage of data in different scientific disciplines across time. To analyse data citation and usage behaviour in the social science domain, Will can query the necessary data from the GESIS KG via its provided SPARQL endpoint. Alternatively, he can download the KG as a dump file to analyse it offline with other preferred tools. The provided documentation of the underlying schema of the GESIS KG helps him to understand the structure of the data.
You can find an example SPARQL query here.
Nancy is a research data engineer and works in a research infrastructure organization. In her current project, she needs to integrate metadata from scientific resources from GESIS and other organizations into another search system. For doing so, she needs to access and harvest metadata from the GESIS KG via an OAI-PMH API. The OAI-PMH API of the GESIS KG will allow Nancy to harvest the metadata from GESIS she needs in a standardized format. The webpage of the API will also provide documentation for Nancy on how to use the API and how to harvest the data.
The GESIS Knowledge Graph contains content from the GESIS Search which comprises information about social science research data, publications on research data and open access publications. Detailed information about the content of the GESIS can be found here. GESIS Search aggregates information from different data collections of GESIS. Additionally, the GESIS KG comprises links between scientific resources such as links between research data and publications which are also integrated and available in the GESIS Search.
Detailed provenance information about the original data sources information and the source of the links is reflected in the GESIS KG and is described in the section Provenance.
The GESIS Knowledge Graph comprises several types of scientific resources, entities and relationships between them. The latest version of the GESIS Knowledge Graph ontology can be found in the Download section.
The figure above illustrates the three main scientific resources which are currently entailed in the GESIS KG. We distinguish between scientific resources and entities. The main resources and entities of the GESIS Knowledge Graph are listed and described below:
Resource name | Class |
---|---|
Dataset | schema:Dataset |
Publication | schema:ScholarlyArticle |
Instrument | disco:Instrument |
Entity name | Class |
---|---|
Person | schema:Person |
Organization | schema:Organization |
Location | schema:Place |
Keyword, Concept, Topic | schema:DefinedTerm |
Scientific resources may occur as part of a group of resources. The following classes are used in the data model to reflect such groups.
Class |
---|
disco:StudyGroup |
schema:CreativeWorkSeries |
schema:Collection |
schema:Periodical |
This section describes the relationships between scientific resources in the GESIS Knowledge Graph and how they are represented in the data model.
Since the GESIS KG holds m:n links between scientific resources and specific data about the links between resources, additional classes determining references and link metadata are included into the data model. In order to enable also a direct link between two resources without details about the link, but for easier querying, we will add direct links in the next release of the data model.
Class | Subclass of |
---|---|
gesiskg:DatasetReference | gesiskg:Reference |
gesiskg:PublicationReference | gesiskg:Reference |
gesiskg:LinkMetadata | gesiskg:ReferenceMetadata |
gesiskg:DuplicateMetadata | gesiskg:ReferenceMetadata |
The links between resources are either manually curated or automatically generated. The manual links are determined by employees of GESIS or are provided via research data bibliographies that are created and curated for specific research data programs. The research data linked in this way are marked as manual and lead to unique research data sets. For the automatically generated links between publications and research data, an internally developed pipeline for dataset citation detection and disambiguation is used which is a further development of the InfoLink tool. This pipeline is used to identify the mention of research data in full texts and then automatically links them to the mentioned research data. The linked research data are marked as automatic and do not necessarily refer to an unique research data set but to possible research data that that is used to create the publication. It is important to mention that the automatically generated links have not yet been evaluated by domain experts.
Metadata name | Description | Property |
---|---|---|
Link context | Text snippet or annotation marking the reason why a link has been detected | gesiskg:linkContext |
Link score | Computed confidence score of the automatically generated link | gesiskg:linkScore |
Linking method | Specifies whether a link is manually curated, automatically generated, or a search link | gesiskg:linkingMethod |
Link type | Specifies whether a link is a citation or marks a methodological usage of a dataset | gesiskg:linkType |
Link source | Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified | gesiskg:linkSource |
In the following table, it is described how provenance information is reflected in the GESIS KG. In different properties, it is captured from which data source within GESIS Search a particular resource is originating, from where a mentioned dataset is originating, and from which link detection pipeline or manual effort a link is originating as well as versioning information.
Metadata name | Description | Property |
---|---|---|
Source info | Specifies information about the original data source of a scientific resource | gesiskg:sourceInfo |
Data source | Specifies information about the source of a dataset mentioned in a publication | gesiskg:dataSource |
Link source | Specifies information about the source of a link, e.g., naming the pipeline by which a link has been generated or the project in which a manual link has been identified | gesiskg:linkSource |
Version | Specifies the versioning information of a resource if available | schema:version |
Resources and entities in the GESIS Knowledge Graph hold several identifiers. While this includes persistent identifiers like DOIs assigned by PID authorities, there are also identifiers assigned to resources by the authority of the data source. Thirdly, a dereferenceable URI within the namespace of the GESIS Knowledge Graph has been assigned to every resource and entity which is part of the graph. The table below gives an overview of all identifiers which are present in the GESIS Knowledge Graph.
Identifier | Description |
---|---|
DOI | Digital Object Identifier |
URN | Uniform Resource Name |
ORCID | Open Researcher and Contributor ID |
ISSN | International Standard Serial Number |
ISBN | International Standard Book Number |
Internal GESIS ID | Internal ID used within GESIS |
GESIS Study Number | Number used for research data archived at GESIS |
GESIS KG URI | Uniform Resource Identifier defined for the GESIS KG, reusing the Internal GESIS ID |
The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.
Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280
Total number of RDF triples | 15003783 |
Total number of scientific resources | 474201 |
Publications | 467125 |
Datasets | 6733 |
Instruments | 343 |
Persons | 403178 |
Organizations | 9691 |
Locations | 30885 |
Keywords, Concepts, Topics | 12951 |
Total number of classes | 31 |
Reused classes | 26 |
New defined classes | 5 |
Total number of object properties | 33 |
Reused object properties | 20 |
New defined object properties | 13 |
Total number of datatype properties | 94 |
Reused datatype properties | 28 |
New defined datatype properties | 66 |
Total number of links | 168362 |
Automatically generated links | 99227 |
Manually curated links | 47309 |
Links between publications and datasets | 162671 |
Links between publications and instruments | 5817 |
Links between datasets and instruments | 74 |
The GESIS Knowledge Graph is available through various access points: via public APIs, as download, and integrated into the GESIS Search portal.
We will provide an OAI-PMH API which will allow you to access and harvest metadata provided by the GESIS KG in the oai_datacite and berd format. This API will come with a dedicated website and documentation.
You can explore the data within the GESIS Knowledge Graph using SPARQL queries at the following SPARQL endpoint: https://data.gesis.org/gesiskg/sparql
Below you can find some example SPARQL queries.
The following query lists all publications which are included in the GESIS KG (up to a limit of 10000 resources). (Result)
SELECT ?id ?title
WHERE {?id ?p <https://schema.org/ScholarlyArticle>.
?id <https://schema.org/name> ?title.
}
LIMIT 10000
To retrieve resources from a different type, change <https://schema.org/ScholarlyArticle> accordingly to <https://schema.org/Dataset> or <https://rdf-vocabulary.ddialliance.org/discovery.html%23dfn-disco-instrument>.
The following query lists all information which is available for a particular resource in the GESIS KG. (Result)
SELECT *
WHERE {<https://data.gesis.org/gesiskg/resource/ZA5282> ?p ?o}
The GESIS KG uses the same IDs for its resources like the IDs in URLs used by the GESIS Search, i.e. URIs of the GESIS KG can be easily constructed if the URL, resp. the ID of a scientific resource in the GESIS Search is known.
Examples
GESIS Search URL of a resource: https://search.gesis.org/research_data/ZA5280
GESIS KG URI of the same resource: https://data.gesis.org/gesiskg/resource/ZA5280
The following query Lists all datasets and the publications that cite them for the topic "Migration" (reflecting the user story of Maxine). (Result)
SELECT ?publication ?publication_title ?dataset ?dataset_title
WHERE {?publication <https://schema.org/about> ?topic.
?topic <https://schema.org/name> "Migration"@en.
?publication ?p <https://schema.org/ScholarlyArticle>.
?publication <https://data.gesis.org/gesiskg/schema/reference> ?r.
?r ?p1 <https://data.gesis.org/gesiskg/schema/DatasetReference>.
?r <https://data.gesis.org/gesiskg/schema/referenceMetadata> ?m.
?m ?p2 <https://data.gesis.org/gesiskg/schema/LinkMetadata>.
?m <https://schema.org/mainEntity> ?dataset.
?dataset ?p3 <https://schema.org/Dataset>.
?publication <https://schema.org/name> ?publication_title.
?dataset <https://schema.org/name> ?dataset_title.
}
This query can easily be adjusted and explored by changing the string and language tag in line 3 from "Migration"@en to, e.g., "Germany"@en, "Gesundheit"@de, or "Politik"@de. Please note that the retrieved results depend on whether resources have been originally indexed with German or English keywords or in both languages.
The following query retrieves a year-wise count of publications citing datasets focusing on the topic "Migration" (reflecting the user story of Will). (Result)
SELECT ?year (COUNT(?publication) AS ?count)
WHERE {?publication ?p <https://schema.org/ScholarlyArticle>.
?publication <https://data.gesis.org/gesiskg/schema/reference> ?r.
?r ?p1 <https://data.gesis.org/gesiskg/schema/DatasetReference>.
?r <https://data.gesis.org/gesiskg/schema/referenceMetadata> ?m.
?m ?p2 <https://data.gesis.org/gesiskg/schema/LinkMetadata>.
?m <https://schema.org/mainEntity> ?dataset.
?dataset ?p3 <https://schema.org/Dataset>.
?dataset <https://schema.org/about> ?topic.
?topic <https://schema.org/name> "Migration"@en.
?publication <https://schema.org/datePublished> ?year .
}
GROUP BY ?year
ORDER BY ?year
You can download the GESIS Knowledge Graph as a full RDF dump as well as its underlying ontology at Zenodo:
https://doi.org/10.5281/zenodo.14229945
The GESIS Knowledge Graph is integrated in the GESIS Search. Links between scientific resources are included in the result list and detailed views of search results.
The GESIS Knowledge Graph is available for access, download, and reuse under a Creative Commons Attribution 4.0 license since the license of some input sources is CC-BY as well.
If you are using the GESIS Knowledge Graph or parts from it, please cite the GESIS KG as follows:
Biswas, D., & Zapilko, B. (2024). GESIS Knowledge Graph (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14229945
This section documents all changes for each version of the GESIS Knowledge Graph.
v1.0.0 - 04.12.2024Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Debanjali Biswas, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Daniel Hienert, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Dagmar Kern, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Yudong Zhang, GESIS - Leibniz Institute for the Social Sciences, https://www.gesis.org/
Knowledge Technologies for the Social Sciences: https://www.gesis.org/en/institute/about-us/departments/knowledge-technologies-for-the-social-sciences
GESIS Knowledge Graph Infrastructure: https://www.gesis.org/en/research/research-area-computational-methods/knowledge-graph-infrastructure