Combining linguistic and statistical analysis to extract relations from web documents

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The World Wide Web provides a nearly endless source of knowledge, which is mostly given in natural language. A first step towards exploiting this data automatically could be to extract pairs of a given semantic relation from text documents - for example all pairs of a person and her birth-date. One strategy for this task is to find text patterns that express the semantic relation, to generalize these patterns, and to apply them to a corpus to find new pairs. In this paper, we show that this approach profits significantly when deep linguistic structures are used instead of surface text patterns. We demonstrate how linguistic structures can be represented for machine learning, and we provide a theoretical analysis of the pattern matching approach. We show the benefits of our approach by extensive experiments with our prototype system LEILA.

Original languageEnglish
Title of host publicationKDD 2006
Subtitle of host publicationProceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery (ACM)
Pages712-717
Number of pages6
ISBN (Print)1595933395, 9781595933393
DOIs
Publication statusPublished - 1 Jan 2006
Externally publishedYes
EventKDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Philadelphia, PA, United States
Duration: 20 Aug 200623 Aug 2006

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2006

Conference

ConferenceKDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CityPhiladelphia, PA
Period20/08/0623/08/06

Keywords

  • Machine Learning
  • Pattern Matching
  • Relation Extraction

Fingerprint

Dive into the research topics of 'Combining linguistic and statistical analysis to extract relations from web documents'. Together they form a unique fingerprint.

Cite this