Skip to main navigation Skip to search Skip to main content

YAWN: A Semantically Annotated Wikipedia XML Corpus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.

Original languageEnglish
Title of host publicationDatenbanksysteme in Business, Technologie und Web, BTW 2007, 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", DBIS 2007, Proceedings
EditorsAlfons Kemper, Harald Schoning, Thomas Rose, Matthias Jarke, Thomas Seidl, Christoph Brochhaus
PublisherGesellschaft fur Informatik (GI)
Pages277-291
Number of pages15
ISBN (Electronic)9783885791973
Publication statusPublished - 1 Jan 2007
Externally publishedYes
EventDatenbanksysteme in Business, Technologie und Web, BTW 2007, 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", DBIS 2007 - Database Systems for Business, Technology and Web, BTW 2007, 12th Conference of the GI Division "Databases and Information Systems", DBIS 2007 - Aachen, Germany
Duration: 7 Mar 20079 Mar 2007

Publication series

NameLecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI)
VolumeP-103
ISSN (Print)1617-5468
ISSN (Electronic)2944-7682

Conference

ConferenceDatenbanksysteme in Business, Technologie und Web, BTW 2007, 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", DBIS 2007 - Database Systems for Business, Technology and Web, BTW 2007, 12th Conference of the GI Division "Databases and Information Systems", DBIS 2007
Country/TerritoryGermany
CityAachen
Period7/03/079/03/07

Fingerprint

Dive into the research topics of 'YAWN: A Semantically Annotated Wikipedia XML Corpus'. Together they form a unique fingerprint.

Cite this