DUTH does Probabilities of Relevance at the Legal Track

Research output: Contribution to journalConference articlepeer-review

Abstract

We participated in the Learning Task of the TREC 2010 Legal Track, focusing solely on estimating probabilities of relevance. We submitted three automated runs based on the same tf.idf ranking, produced by the topic narratives and positive-only feedback of the training data in equal contributions. The runs differ in the way the probabilities of relevance are estimated: (1) DUTHsdtA employed the Truncated Normal-Exponential model to turn scores to probabilities. (2) DUTHsdeA did not assume any specific component score distributions but estimated those on the scores of training data via Kernel Density Estimation (KDE) methods. (3) DUTHlrgA used Logistic Regression with the co-efficients estimated on the scores of training data. We found that DUTHsdeA and DUTHlrgA are greatly affected by biases in the training set, since they assume that input score data are uniformly sampled. Also, KDE was found to be very sensitive to its parameters, influencing greatly the probability estimates. In these respects, DUTHsdtA was proven to be the most robust method.

Original languageEnglish
JournalNIST Special Publication
Publication statusPublished - 1 Jan 2010
Externally publishedYes
Event19th Text REtrieval Conference, TREC 2010 - Gaithersburg, MD, United States
Duration: 16 Nov 201019 Nov 2010

Fingerprint

Dive into the research topics of 'DUTH does Probabilities of Relevance at the Legal Track'. Together they form a unique fingerprint.

Cite this