gesisDataSearchKG- Semantic search over Social Sciences Dataset
Description
gesiDatasSearchKG is an initiative to extend the current Gesis Data Search to semantic search over KG. This version models the sample of 11,965 datasets available in Gesis Data Search represented in the DDI standard and lifts them to the RDF/S model. The comprehensive data model comprises dataset properties ubiquitous to all domains and specific to social sciences to mitigate the challenges of integrating different DDI versions due to varying elements. We exploit established vocabularies such as schema.org, DISCO, PROV-O, and DCAT, adhering to the best practice of ontology reuse.
gesisDataSearchKG consists of a multilingual dataset belonging to 11 study groups with publication dates ranging from 1966 to 2023. We model 33 semantic properties of the Dataset with 530k triplets.
Data model
In the following figure, the data model of gesisDataSearchKG is illustrated.
Vocabularies
Name | Vocabulary and Type | Domain | Range |
---|---|---|---|
StudyGroup | disco:StudyGroup, prov-o:Entity | - | - |
DataCatalog | schema:DataCatalog, prov-o:Entity | - | - |
CatalogRecord | dcat:Class:Catalog_Record, prov-o:Entity | - | - |
Dataset | schema:Dataset, prov-o:Entity | - | - |
Organization | schema:Organization | - | - |
dataset | schema:dataset | DataCatalog | Dataset |
record | dcat:record | DataCatalog | CatalogRecord |
wasGeneratedBy | prov-o:wasGeneratedBy | CatalogRecord | CatalogRecordOrganization |
inGroup | disco:inGroup | Dataset | StudyGroup |
name | schema:name | StudyGroup, CatalogRecordOrganization, Organization, Dataset, Place, Language, Grant_Organization | Text |
identifier | schema:identifier | Dataset | Text |
alternateName | schema:alternateName | Dataset, Language | Text |
creator | schema:creator | Dataset | Organization, Person |
publisher | schema:publisher | Dataset | Organization, Person |
contributor | schema:contributor | Dataset | Organization, Person |
abstract | schema:datePublished | Dataset | Text |
citation | schema:citation | Dataset | Text |
copyrightNotice | schema:copyrightNotice | Dataset | Text |
temporalCoverage | schema:temporalCoverage | Dataset | Text |
about | schema:about | Dataset | Text |
spatialCoverage | schema:spatialCoverage | Dataset | Place |
collectionMode | disco:collectionMode | Dataset | CollectionMode |
endDate | schema:endDate | CollectionMode | Text |
startDate | schema:startDate | CollectionMode | Text |
description | schema:description | CollectionMode | Text |
funding | schema:funding | Dataset | Grant |
funder | schema:funder | Grant | Grant_Organization |
version | schema:version | Dataset | Text |
inLanguage | schema:inLanguage | Dataset | Language |
conditionsOfAccess | schema:conditionsOfAccess | Dataset | Text |
comment | schema:comment | Dataset | Text |
sameAs | prov-o:sameAs | Place | URI |
hasEmbargoDuration | disco:hasEmbargoDuration | Dataset | Text |
analysisUnit | disco:analysisUnit | Dataset | Text |
kindOfData | disco:kindOfData | Dataset | Text |
Availability of Features
Properties | Availability in Percentage | Description |
---|---|---|
study group | 100 | Study grouping information. |
Catalogrecord creator | 100 | Metadata provider details. |
version | 100 | Version number. |
title | 100 | Study title. |
copyright notice | 78.7 | Copyright text. |
published date | 100 | Publication date. |
language | 100 | Study language. |
research identifier | 100 | DOI. |
abstract | 85.34 | Study summary. |
comment | 100 | Comments section. |
citation | 50.35 | Study references. |
creator | 46.62 | Study creator. |
publisher | 99.9 | Publisher details. |
contributor | 63.01 | Contributor details. |
topical coverage | 68.3 | Study topics. |
spatial coverage | 70.6 | Study locations. |
temporal coverage | 68.3 | Time period. |
collection mode description | 15.5 | Data collection methods. |
collection start date | 100 | Start date of data collection. |
collection end date | 66.21 | End date of data collection. |
funding information | 0.008 | Funding details. |
analysis unit | 17.18 | Analysis scope. |
universe | 60.12 | Target population. |
kind of data | 100 | Data type (text, recording, etc.). |
sub title | 5.53 | Alternate title. |
embargo date | 0.36 | Embargo details. |
Knowledge Graph Overview
Statistic | Value |
---|---|
Total Number of Triples | 527819 |
Total Number of Unique Entity Types | 9 |
Dataset
To cite this resource, please use the following DOI: 10.5281/zenodo.11070841.
SPARQL endpoint
A SPARQL endpoint is available to send SPARQL queries and retrieve results from gesisDataSearchKG.
https://data.gesis.org/gesisdatasearchkg/sparql
SPARQL Queries
Example 1: Requesting number of studies published per year. ( Result)
SELECT DISTINCT (year(xsd:dateTime(?date)) AS ?year) (COUNT(?survey) AS ?numberOfStudies)
WHERE {
?survey a .
?survey ?date.
}
GROUP BY YEAR(xsd:dateTime(?date))
ORDER BY YEAR(xsd:dateTime(?date))
Example 2: Requesting number of studies belonging to 'ALLBUS' per year. ( Result)
SELECT DISTINCT ?date (COUNT(?survey) AS ?numberOfStudies)
WHERE {
?survey a .
?survey ?Group.
?Group ?group.
?survey ?subject.
?survey ?date.
filter( regex(?group, "ALLBUS" ))
}
GROUP BY ?date
order BY ?date
Example 3: Requesting studies related to national identity in Germany, performed individually . ( Result)
SELECT ?survey ?title ?Date ?country ?subject ?analysisunit
WHERE {
?survey a .
?survey ?title .
?survey ?subject.
?survey ?country .
?survey ?Date.
?survey ?analysisunit.
FILTER(
REGEX(LCASE(STR(?subject)), "national identity|nationale identität", "i") &&
REGEX(LCASE(STR(?analysisunit)), "individual|individuell", "i")
)
}
URLs for Organization names evaluated can be download via Link.
Dump of ntriples Link.
License
The dataset is published under a Creative Commons Attribution 4.0 license.
Contact
Please provide your feedback and any comments by sending an email to gesisdatasearchkg (at) gesis (dot) org