Extracting Linked data from statistic spreadsheets

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economy etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistics published by INSEE, the national French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD, to populate an instance of this model. We used our method to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.

Original languageEnglish
Title of host publicationProceedings of the International Workshop on Semantic Big Data, SBD 2017 - In conjunction with the 2017 ACM SIGMOD/PODS Conference
EditorsLe Gruenwald, Sven Groppe
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450349871
DOIs
Publication statusPublished - 19 May 2017
EventInternational Workshop on Semantic Big Data, SBD 2017 - Chicago, United States
Duration: 19 May 2017 → …

Publication series

NameProceedings of the International Workshop on Semantic Big Data, SBD 2017 - In conjunction with the 2017 ACM SIGMOD/PODS Conference

Conference

ConferenceInternational Workshop on Semantic Big Data, SBD 2017
Country/TerritoryUnited States
CityChicago
Period19/05/17 → …

Keywords

  • Information extraction
  • Linked data
  • RDF

Fingerprint

Dive into the research topics of 'Extracting Linked data from statistic spreadsheets'. Together they form a unique fingerprint.

Cite this