TY - JOUR
T1 - Reject inference methods in credit scoring
AU - Ehrhardt, Adrien
AU - Biernacki, Christophe
AU - Vandewalle, Vincent
AU - Heinrich, Philippe
AU - Beben, Sébastien
N1 - Publisher Copyright:
© 2021 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - The granting process is based on the probability that the applicant will refund his/her loan given his/her characteristics. This probability, also called score, is learnt based on a dataset in which rejected applicants are excluded. Thus, the population on which the score is used is different from the learning population. Many “reject inference” methods try to exploit the data available from the rejected applicants in the learning process. However, most of these methods are empirical and lack of formalization of their assumptions, and of their expected theoretical properties. We formalize such hidden assumptions in a general missing data setting for some of the most common reject inference methods. It reveals that hidden modelling is mostly incomplete, thus prohibiting to compare existing methods within the general model selection mechanism (except by financing “non-fundable” applicants). So, we assess performance of the methods on both simulated data and real data (from CACF, a major European loan issuer). Unsurprisingly, no method seems uniformly dominant. Both these theoretical and empirical results not only reinforce the idea to carefully use the classical reject inference methods but also to invest in future research works for designing model-based reject inference methods (without financing “non-fundable” applicants).
AB - The granting process is based on the probability that the applicant will refund his/her loan given his/her characteristics. This probability, also called score, is learnt based on a dataset in which rejected applicants are excluded. Thus, the population on which the score is used is different from the learning population. Many “reject inference” methods try to exploit the data available from the rejected applicants in the learning process. However, most of these methods are empirical and lack of formalization of their assumptions, and of their expected theoretical properties. We formalize such hidden assumptions in a general missing data setting for some of the most common reject inference methods. It reveals that hidden modelling is mostly incomplete, thus prohibiting to compare existing methods within the general model selection mechanism (except by financing “non-fundable” applicants). So, we assess performance of the methods on both simulated data and real data (from CACF, a major European loan issuer). Unsurprisingly, no method seems uniformly dominant. Both these theoretical and empirical results not only reinforce the idea to carefully use the classical reject inference methods but also to invest in future research works for designing model-based reject inference methods (without financing “non-fundable” applicants).
KW - Reject inference
KW - credit risk
KW - data augmentation
KW - scorecard
KW - scoring
KW - semi-supervised learning
U2 - 10.1080/02664763.2021.1929090
DO - 10.1080/02664763.2021.1929090
M3 - Article
AN - SCOPUS:85106272126
SN - 0266-4763
VL - 48
SP - 2734
EP - 2754
JO - Journal of Applied Statistics
JF - Journal of Applied Statistics
IS - 13-15
ER -