TY - GEN
T1 - Named entity recognition and identification for finding the owner of a home page
AU - Plachouras, Vassilis
AU - Rivière, Matthieu
AU - Vazirgiannis, Michalis
PY - 2012/5/29
Y1 - 2012/5/29
N2 - Entity-based applications, such as expert search or online social networks where users search for persons, require high-quality datasets of named entity references. Obtaining such high-quality datasets can be achieved by automatically extracting metadata from Web pages. In this work, we focus on the identification of the named entity that corresponds to the owner of a particular Web page, for example, a home page or an organizational staff Web page. More specifically, from a set of named entities that have already been extracted from a Web page, we identify the one which corresponds to the owner of the home page. First, we develop a set of features which are combined in a scoring function to select the named entity of the Web page owner. Second, we formulate the problem as a classification problem in which a pair of a Web page and named entity is classified as being associated or not. We evaluate the proposed approaches on a set of Web pages in which we have previously identified named entities. Our experimental results show that we can identify the named entity corresponding to the owner of a home page with accuracy over 90%.
AB - Entity-based applications, such as expert search or online social networks where users search for persons, require high-quality datasets of named entity references. Obtaining such high-quality datasets can be achieved by automatically extracting metadata from Web pages. In this work, we focus on the identification of the named entity that corresponds to the owner of a particular Web page, for example, a home page or an organizational staff Web page. More specifically, from a set of named entities that have already been extracted from a Web page, we identify the one which corresponds to the owner of the home page. First, we develop a set of features which are combined in a scoring function to select the named entity of the Web page owner. Second, we formulate the problem as a classification problem in which a pair of a Web page and named entity is classified as being associated or not. We evaluate the proposed approaches on a set of Web pages in which we have previously identified named entities. Our experimental results show that we can identify the named entity corresponding to the owner of a home page with accuracy over 90%.
KW - entity selection
KW - named entity recognition
UR - https://www.scopus.com/pages/publications/84861445293
U2 - 10.1007/978-3-642-30217-6_46
DO - 10.1007/978-3-642-30217-6_46
M3 - Conference contribution
AN - SCOPUS:84861445293
SN - 9783642302169
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 554
EP - 565
BT - Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings
T2 - 16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012
Y2 - 29 May 2012 through 1 June 2012
ER -