A Knowledge Graph of Annotated Survey Questions

Description

Social Sciences Survey data comprises data on attitudes, behaviours and factual information of specific population groups.
Instrument of choice for the collection of this data is traditionally a questionnaire. Its used in personal interviews, phone interviews or used as self-complete questionnaire.
Since collecting this data is tedious, researchers are highly interested in secondary analysis.

A dataset stores the respondents’ answers as so-called variables. Variables are defined by the answer categories for a specific question and its notation in the data.
For a social sciences researcher that thrives to find evidence to prove or disprove a thesis the question text is the most meaningful, although additional content related metadata can help as well.
Keywords and topic classification are most common examples, currently they are annotated mostly by hand. However, with increased regularity, annotation is also performed automatically.

We complement existing automatic keyword and topic extraction approaches with dimensions that we call question features. They are specially attuned for survey questions.
However they can be adapted for (at least some) other texts.

In the following we present a sample knowledge graph of GESIS survey questions with the first question feature, the Information type annotated.

Knowledge Graph Statistics

Item Amount
Unique Questions 4024
Question-Item pairs 6236
Studies 250
Variables 49999
Information types (type and subtype) 12402
Statements total 373215


Question Features

This is an overview of the features present in our model. Clicking on a name will provide a more verbose explanation of the respective feature.

Question Feature               Description
Information type The information type of a question characterizes which type of information the respondent is asked to state about question object.
Focus This feature characterizes the focus of the question object. Whether it is focused towards the respondent, another person or if it is wide as in a general question.
Time reference Time reference characterizes the questions time reference wrt. past, present and future.
Periodicity Periodicity characterizes the duration and periodicity of the time the question refers to.
Information intimacy Information intimacy characterizes the sensitivity of the requested information with respect to personal life.
Relative location The relative location states if a location is mentioned which is not described by geographic name but by its meaning for the respondent.
Geographic location The name of a geographic location if mentioned.
Knowledge specificity Describes the specificity of the knowledge that is required to answer the question.
Quantification This feature captures the quantification of the answer. As opposed to Information type it is more concrete and close to physical quantity.
Language tone Language tone characterizes the degree of formality or tone that is applied in the question.
Language complexity Language complexity characterizes the complexity of phrasing applied in the question.


We arranged the question features in a data model. The following figure displays the concrete arrangement and introduces a grouping for improved orientation.

Question Feature Data Model


Data model

Knowledge Graph

The following figure displays the graph our sample is formed like. It contains the necessary information to execute exemplary queries leveraging the Information type question feature.

As there we view our data model as concept we are not reusing available vocabulary for now.
For the survey documentation part (Study, Variable, Question) we applied the known DDI RDF Discovery Vocabulary (Disco), wheras we used our own (qf) for the question features.

Instance model


Instance model

Namespaces

disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
skos:  <http://www.w3/2004/02/skos/core#>
qf:    <http://data.gesis.org/questionfeatures#>
dct:   <http://purl.org/dc/terms/>

In addition to the instance model, the following table shows the encoding of the Information type values. (See also Information type)

Information type               RDF type
Evaluation qf:InformationType_Evaluation
Willingness qf:InformationType_Willingness
Preference qf:InformationType_Preference
Acceptance qf:InformationType_Acceptance
Prediction qf:InformationType_Prediction
Assessment qf:InformationType_Assessment
Explanation qf:InformationType_Explanation
Self-Assessment qf:InformationType_SelfAssessment
Judgement qf:InformationType_Judgement
Fact qf:InformationType_Fact
Demography qf:InformationType_Demography
Participation qf:InformationType_Participation
Activity qf:InformationType_Activity
Decision qf:InformationType_Decision
Use qf:InformationType_Use
Interaction qf:InformationType_Interaction
Behaviour qf:InformationType_Behaviour
(Life-)Event qf:InformationType_LifeEvents
Kognition qf:InformationType_Kognition
Emotion qf:InformationType_Emotion
Knowledge qf:InformationType_Knowledge
Perception qf:InformationType_Perception
Interest qf:InformationType_Interest
Motivation qf:InformationType_Motivation
Believes qf:InformationType_Believes
Understanding qf:InformationType_Understanding


SPARQL endpoint

A SPARQL endpoint is available to send SPARQL queries and retrieve results from the knowlege graph.

https://data.gesis.org/questionfeaturessample/sparql

Example queries

Get Information types for the questions in Study ZA2493. Result

PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?qt ?inf_type WHERE
{
    ?study a disco:Study .
    ?study skos:prefLabel "ZA2493" .
    ?study disco:variable ?var .
    ?var disco:question ?quest .
    ?quest disco:questionText ?qt .
    ?quest qf:conceptualQuestion [
        qf:problem [
            qf:informationType ?inf_type
        ]
    ]
}

Get all demography questions from study ZA3811. Result

PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT DISTINCT ?qt WHERE
{
    ?study a disco:Study .
    ?study skos:prefLabel "ZA3811" .
    ?study disco:variable ?var .
    ?var disco:question ?quest .
    ?quest disco:questionText ?qt .
    ?quest qf:conceptualQuestion [
        qf:problem [
            qf:informationType qf:InformationType_Demography
        ]
    ]
}

Find all Information types from study ZA2493. Result

PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT DISTINCT ?inf_type WHERE
{
    ?study a disco:Study .
    ?study skos:prefLabel "ZA2493" .
    ?study disco:variable ?var .
    ?var disco:question ?quest .
    ?quest qf:conceptualQuestion [
        qf:problem [
            qf:informationType ?inf_type
        ]
    ]
}

Get all answer categories for fact (or others) questions. Result

PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT DISTINCT ?answer_text WHERE
{
    ?quest qf:conceptualQuestion [
        qf:problem [
            qf:informationType qf:InformationType_Fact
        ]
    ] .
    ?quest disco:responseDomain ?rd .
    ?ans skos:inScheme ?rd .
    ?ans skos:prefLabel ?answer_text .
}

Detailed Question Features

Information Type

The information type of a question characterizes which type of information the respondent is asked to state about question object.

Category Subcategory Description
Evaluation Willingness The willingness to do sth., e.g. invest time, invest money, help.
Preference Priority, order, preference. Sympathie with a group / institution / company / value
Acceptance Tolerance, legitimacy, permission, grant, agreement, acknowledgement
Prediction Prediction, prediction of a future development, prediction of future states, prediction of assumable progress
Assessment Judgemental opinion, judgment, evaluation, attitude and self-assessment, Do you think it’s positive / negative …
Explanation “Why”-question, argument for/against, statement of a reason
Fact Demography Sex, gender, nationality, age, marital status/partnership, socio-economic status (education, salary, occupation, income), size of household
Participation Engagement / participation, e.g. in a labor union, in a sports club, political,…
Activity E.g. purchase of a car, free time, travel, sports…
Decision Made decision (past) of the respondent
Use E.g. use of media, resources mobility…
Interaction Inter-human interaction, communication, conflict, advice…
Behaviour Reaction, avoidance behaviour, well being, e.g. “Do you change side, when you encounter a stranger on the street?”
Life Events An event in the life cycle of a person, e.g. birth, marriage etc.
Cognition Emotion Anger, fear, shame, pride…
Knowledge Knowledge that can be verified by neutral entity, check of knowledge of a respondent, state of knowledge
Perception Non-judgemental perception of a situation or a sensation, the realization, intake, registry of a perception (rather objective).
Interest Tendency, preference, a thing a person likes, values or that is of use.
Motivation Inherent incentive for sth. Collection of reasons and influences that cause a decision, action or the like.
Believes In the understanding of religion: emotional certainty, conviction that is not determined by evidence or facts.
Understanding Understanding of a context by the respondent


Focus

This feature characterizes the focus of the question object. Whether it is focused towards the respondent, another person or if it is wide as in a general question.

Category Subcategory Description
Self focus The respondent is the object of the question. When she is asked about her opinion towards another person, institution, etc. external focus or generic focus applies.
External focus Family/Memver of family The respondent is asked about relatives or in-laws.
Acquaintance Persons, who the respondent knows personally but is not related to and not in-lawed to. Also the respondent and the person must not be in a professional relationship.
Affiliate Colleagues, supervisors, business relations
Public Person A person of the public life, that the respondent knows (or could know) but does not know personally.
Institution Organisation, e.g. EU, State, ministry, club, company,…
Object focus/item focus Animals, things, paragraphs, laws… No values, convictions that are inherent in the respondent (self focus)
Event focus The question is about a relevant event (9/11, Fukushima, Trump is elected, Deep Water Horizon, Fire of Notre Dame)
Generic/universal focus Asking generically (Do you think “one should be..”, “there should be…”)
Self+external focus The respondent and an additional entity from external focus are in the focus. “You and your partner”, “You and your family”


Time reference

Time reference characterizes the questions time reference wrt. past, present and future.

Category Description
Past Refers to a past experience, fact etc. of the respondent
Present Refers to a present experience, fact etc. of the respondent
Future Refers to a future scenario, of the respondent
Hypothetical - past Refers to a hypothetical scenario that is set in the past
Hypothetical - present Refers to a hypothetical scenario that is set in the present
Hypothetical - future Refers to a hypothetical scenario that is set in the future


Periodicity

Periodicity characterizes the duration and periodicity of the time the question refers to.

Category Description
Point in time The question mentions a point in time. A period shorter or equal to a day.
Time span The question mentions a period. A period longer than a day.
Periodic point in time The question mentions a recurring point in time.
Unspecific None of the above.


Information intimacy

Information intimacy characterizes the sensitivity of the requested information with respect to personal life.

Category Description
Private The question asks for information of the personal life of the respondent. An information is considered personal, if it cannot be discussed with the general public (By the circumstances the respondent lives in). E.g. “How does your partner contracept?”
Public The question asks for information of the public life of the respondent. An information is public if it can be discussed in the general public.


Relative location

The relative location states if a location is mentioned which is not described by geographic name but by its meaning for the respondent.

Category Description
Without A question does not mention a relative location for the respondent.
Apartment/Flat A question mentions the apartment or flat where the respondent lives.
Neighborhood/Street A question mentions the neighborhood (street/block/veedel) where the respondent lives.
Municipality/City A question mentions the municipality or city the respondent lives in.
Region A question mentions the region the respondent lives in.
Country A question mentions the country the respondent lives in.
Continent A question mentions the continent the respondent lives in.
World A question mentions the world.
Place of work A question mentions the place of work the respondent lives in. E.g. Office, construction place, school, alternating places (salesmen), apartment (housewife),…
Journey A question mentions a journey, short trip or similar. Longest stop is shorter than 2 weeks.
Stays abroad A question mentions a longer stay abroad. The stay is a least 2 weeks long, use journey otherwise.


Geographic location

The name of a geographic location if mentioned.
<location> indicates to use the location name from the Geonames DB.

Category Description
<Continent> The question refers to people, states, events, etc. on a specific continent.
<Countries> The question refers to people, events etc. in a specific country. The name of the country is to be used as it was at the time the question refers to, i.e. DDR even though it does no longer exist. This does not reflect where the respondent lives.
<Region> The question refers to people, events etc. in a specific region. This does not reflect where the respondent lives. This can also be used for regions in other countries. Use this if the mention does not refer to a continent, country or german federal state.
<German federal state> The question refers to people, events etc. in a federal state of Germany.
Others The question refers to people, events etc. in a geographic region that is not covered by the other categories, i.e. BeNeLux-country, Iberian Peninsula, etc. This does not reflect where the respondent lives.
Without No geographic location is mentioned.
Unspecific At some places in Germany,…
Mixed/Multiple When multiple locations are mentioned.


Knowledge specificity

Describes the specificity of the knowledge that is required to answer the question.

Category Description
School The knowledge to answer the question is usually acquired formally at an education institution e.g. at school,… it is available to most of the population including school children.
Daily life The knowledge to answer the question usually acquired during daily life.
Special knowledge The knowledge to answer the question is specialized and only available from specific groups of persons e.g parents or biologists.


This feature captures the quantification of the answer. As opposed to Information type it is more concrete and close to physical quantity.

Category Description
Frequency The question asks for a rate or frequency, i.e. “How often do you…”
Date time The question asks for a point in time.
Time dimension The question asks for timely duration or timely distance, i.e. “How long do you sleep?”, “How long since you…”
Spatial expansion The question asks for a distance, range, height, depth, diameter,…
Mass The question asks for the weight of something.
Amount The question asks for the amount of something, i.e. “How many cars do you own?”.
Level of agreement The question asks for the extent of agreement to certain matter.
Boolean The question is a yes/no question.
Rating The question asks the respondent to perform a form of rating. Only used if Agreement does not apply.
Naming / Denomination The respondent is asked to name one or more items from a list, or to come up with her own item (open question).
Order The respondent is asked to put items in a specific order.
Comparative The respondent is asked to compare two events, things etc.


Language tone

Language tone characterizes the degree of formality or tone that is applied in the question.

Category Description
Colloquial language Language tone as used in daily conversations.
Formal language Language tone as in official letters or TV news.
Jargon/technical language Technical, precise, emotionless tone.


Language complexity

Language complexity characterizes the complexity of phrasing applied in the question.

Category Description
Simple language Language that is especially easy to understand e.g. by children or by language learners.
Moderate language level Language with a complexity that can be understood by most people.
Raised language level Language with a complexity that is above average.


License

The dataset is published under Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.

Publications

Bensmann F., Papenmeier A., Kern D., Zapilko B., Dietze S. (2020) Semantic Annotation, Representation and Linking of Survey Data. In: Blomqvist E. et al. (eds) Semantic Systems. In the Era of Knowledge Graphs. SEMANTICS 2020. Lecture Notes in Computer Science, vol 12378. Springer, Cham. https://doi.org/10.1007/978-3-030-59833-4_4

Contact

Please provide your feedback and any comments by sending an email to felix (dot) bensmann (at) gesis (dot) org

About Us

Felix Bensmann, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Andrea Papenmeier, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Dagmar Kern, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Stefan Dietze, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/