TY - GEN
T1 - Concentric Mixtures of Mallows Models for Top-k Rankings
T2 - 38th International Conference on Machine Learning, ICML 2021
AU - Collas, Fabien
AU - Irurozki, Ekhine
N1 - Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021/1/1
Y1 - 2021/1/1
N2 - In this paper, we study mixtures of two Mallows models for top-k rankings with equal location parameters but with different scale parameters (a mixture of concentric Mallows models). These models arise when we have a heterogeneous population of voters formed by two populations, one of which is a subpopulation of expert voters. We show the identifiability of both components and the learnability of their respective parameters. These results are based upon, first, bounding the sample complexity for the Borda algorithm with top-k rankings. Second, we characterize the distances between rankings, showing that an off-the-shelf clustering algorithm separates the rankings by components with high probability -provided the scales are well-separated.As a by-product, we include an efficient sampling algorithm for Mallows top-k rankings. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.
AB - In this paper, we study mixtures of two Mallows models for top-k rankings with equal location parameters but with different scale parameters (a mixture of concentric Mallows models). These models arise when we have a heterogeneous population of voters formed by two populations, one of which is a subpopulation of expert voters. We show the identifiability of both components and the learnability of their respective parameters. These results are based upon, first, bounding the sample complexity for the Borda algorithm with top-k rankings. Second, we characterize the distances between rankings, showing that an off-the-shelf clustering algorithm separates the rankings by components with high probability -provided the scales are well-separated.As a by-product, we include an efficient sampling algorithm for Mallows top-k rankings. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.
M3 - Conference contribution
AN - SCOPUS:85161348418
T3 - Proceedings of Machine Learning Research
SP - 2079
EP - 2088
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
Y2 - 18 July 2021 through 24 July 2021
ER -