Social Sciences Survey data comprises data on attitudes, behaviours and factual information of specific population groups.
Instrument of choice for the collection of this data is traditionally a questionnaire. Its used in personal interviews, phone interviews or used as self-complete questionnaire.
Since collecting this data is tedious, researchers are highly interested in secondary analysis.
A dataset stores the respondents’ answers as so-called variables. Variables are defined by the answer categories for a specific question and its notation in the data.
For a social sciences researcher that thrives to find evidence to prove or disprove a thesis the question text is the most meaningful, although additional content related metadata can help as well.
Keywords and topic classification are most common examples, currently they are annotated mostly by hand. However, with increased regularity, annotation is also performed automatically.
We complement existing automatic keyword and topic extraction approaches with dimensions that we call question features. They are specially attuned for survey questions.
However they can be adapted for (at least some) other texts.
In the following we present a sample knowledge graph of GESIS survey questions with the first question feature, the Information type annotated.
Knowledge Graph Statistics
Item | Amount |
---|---|
Unique Questions | 4024 |
Question-Item pairs | 6236 |
Studies | 250 |
Variables | 49999 |
Information types (type and subtype) | 12402 |
Statements total | 373215 |
This is an overview of the features present in our model. Clicking on a name will provide a more verbose explanation of the respective feature.
Question Feature | Description |
---|---|
Information type | The information type of a question characterizes which type of information the respondent is asked to state about question object. |
Focus | This feature characterizes the focus of the question object. Whether it is focused towards the respondent, another person or if it is wide as in a general question. |
Time reference | Time reference characterizes the questions time reference wrt. past, present and future. |
Periodicity | Periodicity characterizes the duration and periodicity of the time the question refers to. |
Information intimacy | Information intimacy characterizes the sensitivity of the requested information with respect to personal life. |
Relative location | The relative location states if a location is mentioned which is not described by geographic name but by its meaning for the respondent. |
Geographic location | The name of a geographic location if mentioned. |
Knowledge specificity | Describes the specificity of the knowledge that is required to answer the question. |
Quantification | This feature captures the quantification of the answer. As opposed to Information type it is more concrete and close to physical quantity. |
Language tone | Language tone characterizes the degree of formality or tone that is applied in the question. |
Language complexity | Language complexity characterizes the complexity of phrasing applied in the question. |
We arranged the question features in a data model. The following figure displays the concrete arrangement and introduces a grouping for improved orientation.
Question Feature Data Model
The following figure displays the graph our sample is formed like. It contains the necessary information to execute exemplary queries leveraging the Information type question feature.
As there we view our data model as concept we are not reusing available vocabulary for now.
For the survey documentation part (Study, Variable, Question) we applied the known DDI RDF Discovery Vocabulary
(Disco), wheras we used our own (qf) for the question features.
Instance model
Namespaces
disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
skos: <http://www.w3/2004/02/skos/core#>
qf: <http://data.gesis.org/questionfeatures#>
dct: <http://purl.org/dc/terms/>
In addition to the instance model, the following table shows the encoding of the Information type values. (See also Information type)
Information type | RDF type |
---|---|
Evaluation | qf:InformationType_Evaluation |
Willingness | qf:InformationType_Willingness |
Preference | qf:InformationType_Preference |
Acceptance | qf:InformationType_Acceptance |
Prediction | qf:InformationType_Prediction |
Assessment | qf:InformationType_Assessment |
Explanation | qf:InformationType_Explanation |
Self-Assessment | qf:InformationType_SelfAssessment |
Judgement | qf:InformationType_Judgement |
Fact | qf:InformationType_Fact |
Demography | qf:InformationType_Demography |
Participation | qf:InformationType_Participation |
Activity | qf:InformationType_Activity |
Decision | qf:InformationType_Decision |
Use | qf:InformationType_Use |
Interaction | qf:InformationType_Interaction |
Behaviour | qf:InformationType_Behaviour |
(Life-)Event | qf:InformationType_LifeEvents |
Kognition | qf:InformationType_Kognition |
Emotion | qf:InformationType_Emotion |
Knowledge | qf:InformationType_Knowledge |
Perception | qf:InformationType_Perception |
Interest | qf:InformationType_Interest |
Motivation | qf:InformationType_Motivation |
Believes | qf:InformationType_Believes |
Understanding | qf:InformationType_Understanding |
A SPARQL endpoint is available to send SPARQL queries and retrieve results from the knowlege graph.
https://data.gesis.org/questionfeaturessample/sparql
Get Information types for the questions in Study ZA2493. Result
PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT ?qt ?inf_type WHERE
{
?study a disco:Study .
?study skos:prefLabel "ZA2493" .
?study disco:variable ?var .
?var disco:question ?quest .
?quest disco:questionText ?qt .
?quest qf:conceptualQuestion [
qf:problem [
qf:informationType ?inf_type
]
]
}
Get all demography questions from study ZA3811. Result
PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?qt WHERE
{
?study a disco:Study .
?study skos:prefLabel "ZA3811" .
?study disco:variable ?var .
?var disco:question ?quest .
?quest disco:questionText ?qt .
?quest qf:conceptualQuestion [
qf:problem [
qf:informationType qf:InformationType_Demography
]
]
}
Find all Information types from study ZA2493. Result
PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?inf_type WHERE
{
?study a disco:Study .
?study skos:prefLabel "ZA2493" .
?study disco:variable ?var .
?var disco:question ?quest .
?quest qf:conceptualQuestion [
qf:problem [
qf:informationType ?inf_type
]
]
}
Get all answer categories for fact (or others) questions. Result
PREFIX disco: <http://rdf-vocabulary.ddialliance.org/discovery#>
PREFIX skos: <http://www.w3/2004/02/skos/core#>
PREFIX qf: <http://data.gesis.org/questionfeatures#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?answer_text WHERE
{
?quest qf:conceptualQuestion [
qf:problem [
qf:informationType qf:InformationType_Fact
]
] .
?quest disco:responseDomain ?rd .
?ans skos:inScheme ?rd .
?ans skos:prefLabel ?answer_text .
}
The information type of a question characterizes which type of information the respondent is asked to state about question object.
Category | Subcategory | Description |
---|---|---|
Evaluation | Willingness | The willingness to do sth., e.g. invest time, invest money, help. |
Preference | Priority, order, preference. Sympathie with a group / institution / company / value | |
Acceptance | Tolerance, legitimacy, permission, grant, agreement, acknowledgement | |
Prediction | Prediction, prediction of a future development, prediction of future states, prediction of assumable progress | |
Assessment | Judgemental opinion, judgment, evaluation, attitude and self-assessment, Do you think it’s positive / negative … | |
Explanation | “Why”-question, argument for/against, statement of a reason | |
Fact | Demography | Sex, gender, nationality, age, marital status/partnership, socio-economic status (education, salary, occupation, income), size of household |
Participation | Engagement / participation, e.g. in a labor union, in a sports club, political,… | |
Activity | E.g. purchase of a car, free time, travel, sports… | |
Decision | Made decision (past) of the respondent | |
Use | E.g. use of media, resources mobility… | |
Interaction | Inter-human interaction, communication, conflict, advice… | |
Behaviour | Reaction, avoidance behaviour, well being, e.g. “Do you change side, when you encounter a stranger on the street?” | |
Life Events | An event in the life cycle of a person, e.g. birth, marriage etc. | |
Cognition | Emotion | Anger, fear, shame, pride… |
Knowledge | Knowledge that can be verified by neutral entity, check of knowledge of a respondent, state of knowledge | |
Perception | Non-judgemental perception of a situation or a sensation, the realization, intake, registry of a perception (rather objective). | |
Interest | Tendency, preference, a thing a person likes, values or that is of use. | |
Motivation | Inherent incentive for sth. Collection of reasons and influences that cause a decision, action or the like. | |
Believes | In the understanding of religion: emotional certainty, conviction that is not determined by evidence or facts. | |
Understanding | Understanding of a context by the respondent |
This feature characterizes the focus of the question object. Whether it is focused towards the respondent, another person or if it is wide as in a general question.
Category | Subcategory | Description |
---|---|---|
Self focus | The respondent is the object of the question. When she is asked about her opinion towards another person, institution, etc. external focus or generic focus applies. | |
External focus | Family/Memver of family | The respondent is asked about relatives or in-laws. |
Acquaintance | Persons, who the respondent knows personally but is not related to and not in-lawed to. Also the respondent and the person must not be in a professional relationship. | |
Affiliate | Colleagues, supervisors, business relations | |
Public Person | A person of the public life, that the respondent knows (or could know) but does not know personally. | |
Institution | Organisation, e.g. EU, State, ministry, club, company,… | |
Object focus/item focus | Animals, things, paragraphs, laws… No values, convictions that are inherent in the respondent (self focus) | |
Event focus | The question is about a relevant event (9/11, Fukushima, Trump is elected, Deep Water Horizon, Fire of Notre Dame) | |
Generic/universal focus | Asking generically (Do you think “one should be..”, “there should be…”) | |
Self+external focus | The respondent and an additional entity from external focus are in the focus. “You and your partner”, “You and your family” |
Time reference characterizes the questions time reference wrt. past, present and future.
Category | Description |
---|---|
Past | Refers to a past experience, fact etc. of the respondent |
Present | Refers to a present experience, fact etc. of the respondent |
Future | Refers to a future scenario, of the respondent |
Hypothetical - past | Refers to a hypothetical scenario that is set in the past |
Hypothetical - present | Refers to a hypothetical scenario that is set in the present |
Hypothetical - future | Refers to a hypothetical scenario that is set in the future |
Periodicity characterizes the duration and periodicity of the time the question refers to.
Category | Description |
---|---|
Point in time | The question mentions a point in time. A period shorter or equal to a day. |
Time span | The question mentions a period. A period longer than a day. |
Periodic point in time | The question mentions a recurring point in time. |
Unspecific | None of the above. |
Information intimacy characterizes the sensitivity of the requested information with respect to personal life.
Category | Description | |
---|---|---|
Private | The question asks for information of the personal life of the respondent. An information is considered personal, if it cannot be discussed with the general public (By the circumstances the respondent lives in). E.g. “How does your partner contracept?” | |
Public | The question asks for information of the public life of the respondent. An information is public if it can be discussed in the general public. |
The relative location states if a location is mentioned which is not described by geographic name but by its meaning for the respondent.
Category | Description |
---|---|
Without | A question does not mention a relative location for the respondent. |
Apartment/Flat | A question mentions the apartment or flat where the respondent lives. |
Neighborhood/Street | A question mentions the neighborhood (street/block/veedel) where the respondent lives. |
Municipality/City | A question mentions the municipality or city the respondent lives in. |
Region | A question mentions the region the respondent lives in. |
Country | A question mentions the country the respondent lives in. |
Continent | A question mentions the continent the respondent lives in. |
World | A question mentions the world. |
Place of work | A question mentions the place of work the respondent lives in. E.g. Office, construction place, school, alternating places (salesmen), apartment (housewife),… |
Journey | A question mentions a journey, short trip or similar. Longest stop is shorter than 2 weeks. |
Stays abroad | A question mentions a longer stay abroad. The stay is a least 2 weeks long, use journey otherwise. |
The name of a geographic location if mentioned.
<location> indicates to use the location name from the Geonames DB.
Category | Description |
---|---|
<Continent> | The question refers to people, states, events, etc. on a specific continent. |
<Countries> | The question refers to people, events etc. in a specific country. The name of the country is to be used as it was at the time the question refers to, i.e. DDR even though it does no longer exist. This does not reflect where the respondent lives. |
<Region> | The question refers to people, events etc. in a specific region. This does not reflect where the respondent lives. This can also be used for regions in other countries. Use this if the mention does not refer to a continent, country or german federal state. |
<German federal state> | The question refers to people, events etc. in a federal state of Germany. |
Others | The question refers to people, events etc. in a geographic region that is not covered by the other categories, i.e. BeNeLux-country, Iberian Peninsula, etc. This does not reflect where the respondent lives. |
Without | No geographic location is mentioned. |
Unspecific | At some places in Germany,… |
Mixed/Multiple | When multiple locations are mentioned. |
Describes the specificity of the knowledge that is required to answer the question.
Category | Description |
---|---|
School | The knowledge to answer the question is usually acquired formally at an education institution e.g. at school,… it is available to most of the population including school children. |
Daily life | The knowledge to answer the question usually acquired during daily life. |
Special knowledge | The knowledge to answer the question is specialized and only available from specific groups of persons e.g parents or biologists. |
This feature captures the quantification of the answer. As opposed to Information type it is more concrete and close to physical quantity.
Category | Description |
---|---|
Frequency | The question asks for a rate or frequency, i.e. “How often do you…” |
Date time | The question asks for a point in time. |
Time dimension | The question asks for timely duration or timely distance, i.e. “How long do you sleep?”, “How long since you…” |
Spatial expansion | The question asks for a distance, range, height, depth, diameter,… |
Mass | The question asks for the weight of something. |
Amount | The question asks for the amount of something, i.e. “How many cars do you own?”. |
Level of agreement | The question asks for the extent of agreement to certain matter. |
Boolean | The question is a yes/no question. |
Rating | The question asks the respondent to perform a form of rating. Only used if Agreement does not apply. |
Naming / Denomination | The respondent is asked to name one or more items from a list, or to come up with her own item (open question). |
Order | The respondent is asked to put items in a specific order. |
Comparative | The respondent is asked to compare two events, things etc. |
Language tone characterizes the degree of formality or tone that is applied in the question.
Category | Description |
---|---|
Colloquial language | Language tone as used in daily conversations. |
Formal language | Language tone as in official letters or TV news. |
Jargon/technical language | Technical, precise, emotionless tone. |
Language complexity characterizes the complexity of phrasing applied in the question.
Category | Description |
---|---|
Simple language | Language that is especially easy to understand e.g. by children or by language learners. |
Moderate language level | Language with a complexity that can be understood by most people. |
Raised language level | Language with a complexity that is above average. |
The dataset is published under Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.
Bensmann F., Papenmeier A., Kern D., Zapilko B., Dietze S. (2020) Semantic Annotation, Representation and Linking of Survey Data. In: Blomqvist E. et al. (eds) Semantic Systems. In the Era of Knowledge Graphs. SEMANTICS 2020. Lecture Notes in Computer Science, vol 12378. Springer, Cham. https://doi.org/10.1007/978-3-030-59833-4_4
Please provide your feedback and any comments by sending an email to felix (dot) bensmann (at) gesis (dot) org
Felix Bensmann, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Andrea Papenmeier, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Dagmar Kern, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Benjamin Zapilko, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/
Stefan Dietze, GESIS - Leibniz Institute for the Social Sciences (Germany), https://www.gesis.org/