TY - GEN
T1 - Assessing the Multi-labelness of Multi-label Data
AU - Park, Laurence A.F.
AU - Guo, Yi
AU - Read, Jesse
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Before constructing a classifier, we should examine the data to gain an understanding of the relationships between the variables, to assist with the design of the classifier. Using multi-label data requires us to examine the association between labels: its multi-labelness. We cannot directly measure association between two labels, since the labels’ relationships are confounded with the set of observation variables. A better approach is to fit an analytical model to a label with respect to the observations and remaining labels, but this might present false relationships due to the problem of multicollinearity between the observations and labels. In this article, we examine the utility of regularised logistic regression and a new form of split logistic regression for assessing the multi-labelness of data. We find that a split analytical model using regularisation is able to provide fewer label relationships when no relationships exist, or if the labels can be partitioned. We also find that if label relationships do exist, logistic regression with l1 regularisation provides the better measurement of multi-labelness.
AB - Before constructing a classifier, we should examine the data to gain an understanding of the relationships between the variables, to assist with the design of the classifier. Using multi-label data requires us to examine the association between labels: its multi-labelness. We cannot directly measure association between two labels, since the labels’ relationships are confounded with the set of observation variables. A better approach is to fit an analytical model to a label with respect to the observations and remaining labels, but this might present false relationships due to the problem of multicollinearity between the observations and labels. In this article, we examine the utility of regularised logistic regression and a new form of split logistic regression for assessing the multi-labelness of data. We find that a split analytical model using regularisation is able to provide fewer label relationships when no relationships exist, or if the labels can be partitioned. We also find that if label relationships do exist, logistic regression with l1 regularisation provides the better measurement of multi-labelness.
U2 - 10.1007/978-3-030-46147-8_10
DO - 10.1007/978-3-030-46147-8_10
M3 - Conference contribution
AN - SCOPUS:85084816411
SN - 9783030461461
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 164
EP - 179
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings
A2 - Brefeld, Ulf
A2 - Fromont, Elisa
A2 - Hotho, Andreas
A2 - Knobbe, Arno
A2 - Maathuis, Marloes
A2 - Robardet, Céline
PB - Springer
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
Y2 - 16 September 2019 through 20 September 2019
ER -