Canonicalizing open knowledge bases

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Open information extraction approaches have led to the creation of large knowledge bases from the Web. The problem with such methods is that their entities and relations are not canonicalized, leading to redundant and ambiguous facts. For example, they may store (Barack Obama, was born in, Honolulu) and (Obama, place of birth, Honolulu). In this paper, we present an approach based on machine learning methods that can canonicalize such Open IE triples, by clustering synonymous names and phrases. We also provide a detailed discussion about the different signals, features and design choices that influence the quality of synonym resolution for noun phrases in Open IE KBs, thus shedding light on the middle ground between "open" and "closed" information extraction systems.

Original languageEnglish
Title of host publicationCIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1679-1688
Number of pages10
ISBN (Electronic)9781450325981
DOIs
Publication statusPublished - 3 Nov 2014
Event23rd ACM International Conference on Information and Knowledge Management, CIKM 2014 - Shanghai, China
Duration: 3 Nov 20147 Nov 2014

Publication series

NameCIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

Conference

Conference23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
Country/TerritoryChina
CityShanghai
Period3/11/147/11/14

Fingerprint

Dive into the research topics of 'Canonicalizing open knowledge bases'. Together they form a unique fingerprint.

Cite this