Graph-based Analysis of Datasets in the LOD Cloud

Home

RDF Graph Measures for the Analysis of RDF Graphs

Introduction

This website presents a browsable version of an analysis of graph measures conducted on RDF datasets, which were part of the last LOD Cloud 2017 (22th August 2017). It is a case study for a software framework that is able to acquire, efficiently prepare and perform a graph-based analysis on large scale RDF graphs.

Publication

Both collections, i.e., all 280 datasets analyzed and the results for 56 graph measures, are part of a resource published with a paper at ESWC 2019. The paper has won the best student paper award

A Software Framework and Datasets for the Analysis of Graph Measures on RDF Graphs

Zloch, Matthäus and Acosta, Maribel and Hienert, Daniel and Dietze, Stefan and Conrad, Stefan. 2019, Portoroz, Slovenia.

In The Semantic Web, 16th Extended Semantic Web Conference.

Further information

The results are presented per dataset. To the left you can see the domains introduced by the LOD Cloud. Per dataset, you can download (a) the original metadata package acquired and (b) a serialized binary object that represented the graph-structure at the time of analysis. The main benefit from this collection is that each RDF dataset is already prepared. This enables to reproduce the results and to perform further analysis of graph measures on the graphs from scratch without further preparation.

The framework is available for reuse. The source code is maintained on Github .

for the framework.

for the datasets.

Downloads

You can download a csv-file export of all the results from our Github repository. There you will find:

Results for all datasets: analysis_results.csv.
Descriptive Statistics over all datasets grouped by domains: analysis_statistics.csv.
Correlation analysis of the measures used in this study: correlation_analysis.csv.

Statistics

Below are some basic descriptive statistics about all of the analyzed datasets.

Domain	Max. # of Vertices	Max. # of Edges	Avg. # of Vertices	Avg. # of Edges
Cross Domain	`614,448,283`	`2,656,226,986`	`57,827,358`	`218,930,066`
Geography	`47,541,174`	`340,880,391`	`9,763,721`	`61,049,429`
Government	`131,634,287`	`1,489,689,235`	`7,491,531`	`71,263,878`
Life Sciences	`356,837,444`	`722,889,087`	`25,550,646`	`85,262,882`
Linguistics	`120,683,397`	`291,314,466`	`1,260,455`	`3,347,268`
Media	`48,318,259`	`161,749,815`	`9,504,622`	`31,100,859`
Publications	`218,757,266`	`720,668,819`	`9,036,204`	`28,017,502`
Social Networking	`331,647`	`1,600,499`	`237,003`	`1,062,986`
User Generated	`2,961,628`	`4,932,352`	`967,798`	`1,992,069`

RDF Graph Measures for the Analysis of RDF Graphs

RDF Graph Measures for the Analysis of RDF Graphs

Introduction

Publication

A Software Framework and Datasets for the Analysis of Graph Measures on RDF Graphs

Further information

Downloads

Statistics

Domains of the LOD Cloud

Find results for datasets by domain

15

Cross_domain

11

Geography

37

Government

32

Life_sciences

122

Linguistics

6

Media

50

Publications

3

Social_networking

4

User_generated