Go to GESIS-Homepage
Go to homepage

gesisDataSearchKG

gesisDataSearchKG- Semantic search over Social Sciences Dataset


Description

gesiDatasSearchKG is an initiative to extend the current Gesis Data Search to semantic search over KG. This version models the sample of 11,965 datasets available in Gesis Data Search represented in the DDI standard and lifts them to the RDF/S model. The comprehensive data model comprises dataset properties ubiquitous to all domains and specific to social sciences to mitigate the challenges of integrating different DDI versions due to varying elements. We exploit established vocabularies such as schema.org, DISCO, PROV-O, and DCAT, adhering to the best practice of ontology reuse.

gesisDataSearchKG consists of a multilingual dataset belonging to 11 study groups with publication dates ranging from 1966 to 2023. We model 33 semantic properties of the Dataset with 530k triplets.

Data model

In the following figure, the data model of gesisDataSearchKG is illustrated.

Vocabularies

Name Vocabulary and Type Domain Range
StudyGroup disco:StudyGroup, prov-o:Entity - -
DataCatalog schema:DataCatalog, prov-o:Entity - -
CatalogRecord dcat:Class:Catalog_Record, prov-o:Entity - -
Dataset schema:Dataset, prov-o:Entity - -
Organization schema:Organization - -
dataset schema:dataset DataCatalog Dataset
record dcat:record DataCatalog CatalogRecord
wasGeneratedBy prov-o:wasGeneratedBy CatalogRecord CatalogRecordOrganization
inGroup disco:inGroup Dataset StudyGroup
name schema:name StudyGroup, CatalogRecordOrganization, Organization, Dataset, Place, Language, Grant_Organization Text
identifier schema:identifier Dataset Text
alternateName schema:alternateName Dataset, Language Text
creator schema:creator Dataset Organization, Person
publisher schema:publisher Dataset Organization, Person
contributor schema:contributor Dataset Organization, Person
abstract schema:datePublished Dataset Text
citation schema:citation Dataset Text
copyrightNotice schema:copyrightNotice Dataset Text
temporalCoverage schema:temporalCoverage Dataset Text
about schema:about Dataset Text
spatialCoverage schema:spatialCoverage Dataset Place
collectionMode disco:collectionMode Dataset CollectionMode
endDate schema:endDate CollectionMode Text
startDate schema:startDate CollectionMode Text
description schema:description CollectionMode Text
funding schema:funding Dataset Grant
funder schema:funder Grant Grant_Organization
version schema:version Dataset Text
inLanguage schema:inLanguage Dataset Language
conditionsOfAccess schema:conditionsOfAccess Dataset Text
comment schema:comment Dataset Text
sameAs prov-o:sameAs Place URI
hasEmbargoDuration disco:hasEmbargoDuration Dataset Text
analysisUnit disco:analysisUnit Dataset Text
kindOfData disco:kindOfData Dataset Text

Availability of Features

All properties associated with the social science study and their distribution. Availability in Percentage denotes the percentage of total datasets (11965) that these properties are included.
Properties Availability in Percentage Description
study group 100 Study grouping information.
Catalogrecord creator 100 Metadata provider details.
version 100 Version number.
title 100 Study title.
copyright notice 78.7 Copyright text.
published date 100 Publication date.
language 100 Study language.
research identifier 100 DOI.
abstract 85.34 Study summary.
comment 100 Comments section.
citation 50.35 Study references.
creator 46.62 Study creator.
publisher 99.9 Publisher details.
contributor 63.01 Contributor details.
topical coverage 68.3 Study topics.
spatial coverage 70.6 Study locations.
temporal coverage 68.3 Time period.
collection mode description 15.5 Data collection methods.
collection start date 100 Start date of data collection.
collection end date 66.21 End date of data collection.
funding information 0.008 Funding details.
analysis unit 17.18 Analysis scope.
universe 60.12 Target population.
kind of data 100 Data type (text, recording, etc.).
sub title 5.53 Alternate title.
embargo date 0.36 Embargo details.

Knowledge Graph Overview

Statistic Value
Total Number of Triples 527819
Total Number of Unique Entity Types 9

Dataset

To cite this resource, please use the following DOI: 10.5281/zenodo.11070841.

SPARQL endpoint

A SPARQL endpoint is available to send SPARQL queries and retrieve results from gesisDataSearchKG.

https://data.gesis.org/gesisdatasearchkg/sparql

SPARQL Queries

Example 1: Requesting number of studies published per year. ( Result)

					
						SELECT  DISTINCT   (year(xsd:dateTime(?date)) AS ?year) (COUNT(?survey) AS ?numberOfStudies)
						WHERE {
								?survey a .

							?survey  ?date.
									

							}
							GROUP BY YEAR(xsd:dateTime(?date))
							ORDER BY YEAR(xsd:dateTime(?date))
					
			

Example 2: Requesting number of studies belonging to 'ALLBUS' per year. ( Result)

					 
						SELECT  DISTINCT   ?date (COUNT(?survey) AS ?numberOfStudies)
						WHERE {
							 ?survey a .
							?survey  ?Group.
							?Group  ?group.
							?survey  ?subject.
							?survey  ?date.
							filter( regex(?group, "ALLBUS" ))
						
						}
						GROUP BY  ?date
						order BY ?date
						
					 
			 

Example 3: Requesting studies related to national identity in Germany, performed individually . ( Result)

						 
							SELECT ?survey ?title ?Date ?country ?subject ?analysisunit
  
							WHERE { 
							
						  ?survey a .
						  
						  ?survey  ?title .
						  
						  ?survey  ?subject.
						  
						  ?survey  ?country .
						  
						  ?survey  ?Date.
						  
						  ?survey  ?analysisunit.
						  
						  FILTER(    
						  
							REGEX(LCASE(STR(?subject)), "national identity|nationale identität", "i") && 
							
							REGEX(LCASE(STR(?analysisunit)), "individual|individuell", "i") 
							
						  ) 
						}
							
						 
				 

URLs for Organization names evaluated can be download via Link.

Dump of ntriples Link.

License

The dataset is published under a Creative Commons Attribution 4.0 license.

Contact

Please provide your feedback and any comments by sending an email to gesisdatasearchkg (at) gesis (dot) org

About Us