
-
Previous Article
Quantum topological data analysis with continuous variables
- FoDS Home
- This Issue
- Next Article
Issues using logistic regression with class imbalance, with a case study from credit risk modelling
Department of Mathematics, Imperial College London, London, SW7 2AZ, UK |
The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than the majority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen [
References:
[1] |
E. I. Altman and G. Sabato,
Modelling credit risk for smes: Evidence from the US market, Abacus, 43 (2007), 332-357.
doi: 10.1111/j.1467-6281.2007.00234.x. |
[2] |
G. E. Batista, R. C. Prati and M. C. Monard,
A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6 (2004), 20-29.
doi: 10.1145/1007730.1007735. |
[3] |
C. Bravo, L. C. Thomas and R. Weber,
Improving credit scoring by differentiating defaulter behaviour, Journal of the Operational Research Society, 66 (2015), 771-781.
doi: 10.1057/jors.2014.50. |
[4] |
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer,
SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16 (2002), 321-357.
doi: 10.1613/jair.953. |
[5] |
T. M. Clauretie,
A note on mortgage risk: Default vs. loss rates, Real Estate Economics, 18 (1990), 202-206.
doi: 10.1111/1540-6229.00517. |
[6] |
Cornell Law School, Definition of default, date of default, and requirement of notice of default, URL https://www.law.cornell.edu/cfr/text/24/203.467. |
[7] |
E. R. DeLong and D. L. Clarke-Pearson,
Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach, Biometrics, 44 (1988), 837-845.
doi: 10.2307/2531595. |
[8] |
B. Efron and T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, Institute of Mathematical Statistics (IMS) Monographs, 5. Cambridge University Press, New York, 2016.
doi: 10.1017/CBO9781316576533.![]() ![]() ![]() |
[9] |
T. Fawcett,
An introduction to ROC analysis, Pattern Recognition Letters, 27 (2006), 861-874.
doi: 10.1016/j.patrec.2005.10.010. |
[10] |
D. J. Hand, Reject inference in credit operations, Credit Risk Modeling: Design and Application, 181–190. |
[11] |
A. E. Hoerl and R. W. Kennard,
Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12 (1970), 55-67.
|
[12] |
G. King and L. Zeng,
Logistic regression in rare events data, Political analysis, 9 (2001), 137-163.
|
[13] |
G. Krempl and V. Hofer, Classification in presence of drift and latency, in Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, IEEE, 2011, 596–603.
doi: 10.1109/ICDMW.2011.47. |
[14] |
J. Laurikkala,
Improving identification of difficult small classes by balancing class distribution, Artificial Intelligence in Medicine, 2101 (2001), 63-66.
doi: 10.1007/3-540-48229-6_9. |
[15] |
X.-Y. Liu, J. Wu and Z.-H. Zhou,
Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39 (2009), 539-550.
|
[16] |
F. J. Massey Jr,
The Kolmogorov-{S}mirnov test for goodness of fit, Journal of the American Statistical Association, 46 (1951), 68-78.
|
[17] |
F. Murtagh and P. Contreras,
Algorithms for hierarchical clustering: An overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2 (2012), 86-97.
|
[18] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[19] |
A. B. Owen,
Infinitely imbalanced logistic regression, Journal of Machine Learning Research, 8 (2007), 761-773.
|
[20] |
O. Pons,
Bootstrap of means under stratified sampling, Electronic Journal of Statistics, 1 (2007), 381-391.
doi: 10.1214/07-EJS033. |
[21] |
R. Rockafellar, Convex Analysis, Princeton University Press, Princeton, N.J. 1970. |
[22] |
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse and A. Napolitano, Resampling or reweighting: A comparison of boosting implementations, in 2008 20th IEEE International Conference on Tools with Artificial Intelligence, IEEE, 1 (2008), 445–451.
doi: 10.1109/ICTAI.2008.59. |
[23] |
M. J. Silvapulle,
On the existence of maximum likelihood estimators for the binomial response models, Journal of the Royal Statistical Society. Series B (Methodological), 43 (1981), 310-313.
doi: 10.1111/j.2517-6161.1981.tb01676.x. |
[24] |
St udent,
The probable error of a mean, Biometrika, 6 (1908), 1-25.
|
[25] |
L. C. Thomas, Consumer Credit Models: Pricing, Profit and Portfolios, Oxford, 2009. |
[26] |
R. Tibshirani,
Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58 (1996), 267-288.
doi: 10.1111/j.2517-6161.1996.tb02080.x. |
[27] |
R. Tibshirani,
The lasso problem and uniqueness, Electronic Journal of Statistics, 7 (2013), 1456-1490.
doi: 10.1214/13-EJS815. |
[28] |
H. Wang, Q. Xu and L. Zhou, Large unbalanced credit scoring using lasso-logistic regression ensemble, PLoS ONE, 10 (2015), e0117844.
doi: 10.1371/journal.pone.0117844. |
[29] |
V. Wieringen and Wessel, Lecture notes on ridge regression, arXiv preprint, arXiv: 1509.09169. |
[30] |
G. Zeng,
On the existence of maximum likelihood estimates for weighted logistic regression, Communications in Statistics-Theory and Methods, 46 (2017), 11194-11203.
doi: 10.1080/03610926.2016.1260742. |
[31] |
M. Zhu, W. Su and H. A. Chipman,
Lago: A computationally efficient approach for statistical detection, Technometrics, 48 (2006), 193-205.
doi: 10.1198/004017005000000643. |
show all references
References:
[1] |
E. I. Altman and G. Sabato,
Modelling credit risk for smes: Evidence from the US market, Abacus, 43 (2007), 332-357.
doi: 10.1111/j.1467-6281.2007.00234.x. |
[2] |
G. E. Batista, R. C. Prati and M. C. Monard,
A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6 (2004), 20-29.
doi: 10.1145/1007730.1007735. |
[3] |
C. Bravo, L. C. Thomas and R. Weber,
Improving credit scoring by differentiating defaulter behaviour, Journal of the Operational Research Society, 66 (2015), 771-781.
doi: 10.1057/jors.2014.50. |
[4] |
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer,
SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16 (2002), 321-357.
doi: 10.1613/jair.953. |
[5] |
T. M. Clauretie,
A note on mortgage risk: Default vs. loss rates, Real Estate Economics, 18 (1990), 202-206.
doi: 10.1111/1540-6229.00517. |
[6] |
Cornell Law School, Definition of default, date of default, and requirement of notice of default, URL https://www.law.cornell.edu/cfr/text/24/203.467. |
[7] |
E. R. DeLong and D. L. Clarke-Pearson,
Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach, Biometrics, 44 (1988), 837-845.
doi: 10.2307/2531595. |
[8] |
B. Efron and T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, Institute of Mathematical Statistics (IMS) Monographs, 5. Cambridge University Press, New York, 2016.
doi: 10.1017/CBO9781316576533.![]() ![]() ![]() |
[9] |
T. Fawcett,
An introduction to ROC analysis, Pattern Recognition Letters, 27 (2006), 861-874.
doi: 10.1016/j.patrec.2005.10.010. |
[10] |
D. J. Hand, Reject inference in credit operations, Credit Risk Modeling: Design and Application, 181–190. |
[11] |
A. E. Hoerl and R. W. Kennard,
Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12 (1970), 55-67.
|
[12] |
G. King and L. Zeng,
Logistic regression in rare events data, Political analysis, 9 (2001), 137-163.
|
[13] |
G. Krempl and V. Hofer, Classification in presence of drift and latency, in Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, IEEE, 2011, 596–603.
doi: 10.1109/ICDMW.2011.47. |
[14] |
J. Laurikkala,
Improving identification of difficult small classes by balancing class distribution, Artificial Intelligence in Medicine, 2101 (2001), 63-66.
doi: 10.1007/3-540-48229-6_9. |
[15] |
X.-Y. Liu, J. Wu and Z.-H. Zhou,
Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39 (2009), 539-550.
|
[16] |
F. J. Massey Jr,
The Kolmogorov-{S}mirnov test for goodness of fit, Journal of the American Statistical Association, 46 (1951), 68-78.
|
[17] |
F. Murtagh and P. Contreras,
Algorithms for hierarchical clustering: An overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2 (2012), 86-97.
|
[18] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[19] |
A. B. Owen,
Infinitely imbalanced logistic regression, Journal of Machine Learning Research, 8 (2007), 761-773.
|
[20] |
O. Pons,
Bootstrap of means under stratified sampling, Electronic Journal of Statistics, 1 (2007), 381-391.
doi: 10.1214/07-EJS033. |
[21] |
R. Rockafellar, Convex Analysis, Princeton University Press, Princeton, N.J. 1970. |
[22] |
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse and A. Napolitano, Resampling or reweighting: A comparison of boosting implementations, in 2008 20th IEEE International Conference on Tools with Artificial Intelligence, IEEE, 1 (2008), 445–451.
doi: 10.1109/ICTAI.2008.59. |
[23] |
M. J. Silvapulle,
On the existence of maximum likelihood estimators for the binomial response models, Journal of the Royal Statistical Society. Series B (Methodological), 43 (1981), 310-313.
doi: 10.1111/j.2517-6161.1981.tb01676.x. |
[24] |
St udent,
The probable error of a mean, Biometrika, 6 (1908), 1-25.
|
[25] |
L. C. Thomas, Consumer Credit Models: Pricing, Profit and Portfolios, Oxford, 2009. |
[26] |
R. Tibshirani,
Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58 (1996), 267-288.
doi: 10.1111/j.2517-6161.1996.tb02080.x. |
[27] |
R. Tibshirani,
The lasso problem and uniqueness, Electronic Journal of Statistics, 7 (2013), 1456-1490.
doi: 10.1214/13-EJS815. |
[28] |
H. Wang, Q. Xu and L. Zhou, Large unbalanced credit scoring using lasso-logistic regression ensemble, PLoS ONE, 10 (2015), e0117844.
doi: 10.1371/journal.pone.0117844. |
[29] |
V. Wieringen and Wessel, Lecture notes on ridge regression, arXiv preprint, arXiv: 1509.09169. |
[30] |
G. Zeng,
On the existence of maximum likelihood estimates for weighted logistic regression, Communications in Statistics-Theory and Methods, 46 (2017), 11194-11203.
doi: 10.1080/03610926.2016.1260742. |
[31] |
M. Zhu, W. Su and H. A. Chipman,
Lago: A computationally efficient approach for statistical detection, Technometrics, 48 (2006), 193-205.
doi: 10.1198/004017005000000643. |





Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 1.1215 | 41.7805 | -0.5247 | 0.5917 | 59.1750 | 0.6879 | 1.9896 |
1000 | 0.5656 | 65.3495 | -2.4591 | 0.0855 | 85.5127 | 0.2454 | 1.2782 |
10000 | 0.5013 | 68.3830 | -4.6289 | 0.0098 | 97.6581 | 0.0450 | 1.0460 |
100000 | 0.5007 | 68.6940 | -6.9102 | 0.0010 | 99.7516 | 0.0049 | 1.0050 |
1000000 | 0.5001 | 68.7254 | -9.2106 | 0.0001 | 99.9750 | 0.0005 | 1.0005 |
Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 1.1215 | 41.7805 | -0.5247 | 0.5917 | 59.1750 | 0.6879 | 1.9896 |
1000 | 0.5656 | 65.3495 | -2.4591 | 0.0855 | 85.5127 | 0.2454 | 1.2782 |
10000 | 0.5013 | 68.3830 | -4.6289 | 0.0098 | 97.6581 | 0.0450 | 1.0460 |
100000 | 0.5007 | 68.6940 | -6.9102 | 0.0010 | 99.7516 | 0.0049 | 1.0050 |
1000000 | 0.5001 | 68.7254 | -9.2106 | 0.0001 | 99.9750 | 0.0005 | 1.0005 |
Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 2.2347 | 16.2756 | -1.0602 | 0.3464 | 34.6374 | 1.2598 | 3.5246 |
1000 | 3.2033 | 8.4214 | -3.4516 | 0.0317 | 31.6947 | 1.6478 | 5.1958 |
10000 | 4.6591 | 2.8035 | -4.9902 | 0.0068 | 68.0441 | 0.7112 | 2.0364 |
100000 | 6.3475 | 0.7238 | -6.9521 | 0.0010 | 95.6659 | 0.0878 | 1.0918 |
1000000 | 8.1866 | 0.1524 | -9.2148 | 0.0001 | 99.5517 | 0.0090 | 1.0090 |
Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 2.2347 | 16.2756 | -1.0602 | 0.3464 | 34.6374 | 1.2598 | 3.5246 |
1000 | 3.2033 | 8.4214 | -3.4516 | 0.0317 | 31.6947 | 1.6478 | 5.1958 |
10000 | 4.6591 | 2.8035 | -4.9902 | 0.0068 | 68.0441 | 0.7112 | 2.0364 |
100000 | 6.3475 | 0.7238 | -6.9521 | 0.0010 | 95.6659 | 0.0878 | 1.0918 |
1000000 | 8.1866 | 0.1524 | -9.2148 | 0.0001 | 99.5517 | 0.0090 | 1.0090 |
Fixture | Logistic Regression | Ridge | Lasso |
certain value, |
n | n | |
certain value, |
0 | 0 |
Fixture | Logistic Regression | Ridge | Lasso |
certain value, |
n | n | |
certain value, |
0 | 0 |
0.0190 | 0 | 0 | 0 | 0 | 0 |
0.0168 | 0.1650 | 0 | 0 | 0 | 0 |
0.0153 | 0.3106 | 0.1148 | 0 | 0 | 0 |
0.0139 | 0.4388 | 0.2416 | 0.0377 | 0 | 0 |
0.0116 | 0.6435 | 0.4445 | 0.2392 | 0.0471 | 0 |
0.0087 | 0.8621 | 0.6581 | 0.4525 | 0.2547 | 0.0516 |
0.0190 | 0 | 0 | 0 | 0 | 0 |
0.0168 | 0.1650 | 0 | 0 | 0 | 0 |
0.0153 | 0.3106 | 0.1148 | 0 | 0 | 0 |
0.0139 | 0.4388 | 0.2416 | 0.0377 | 0 | 0 |
0.0116 | 0.6435 | 0.4445 | 0.2392 | 0.0471 | 0 |
0.0087 | 0.8621 | 0.6581 | 0.4525 | 0.2547 | 0.0516 |
Logistic Regression | Multinomial Logistic Regression | |||||
Coefficients | Estimate | Cluster | Coefficients | Estimate | ||
Intercept | -5.7705 | Intercept | -7.6828 | |||
1.1384 | Intercept | -7.6828 | ||||
1.1287 | 0.0818 | 0.6045 | ||||
2.2775 | ||||||
2.3532 | ||||||
0.0953 | 0.5412 |
Logistic Regression | Multinomial Logistic Regression | |||||
Coefficients | Estimate | Cluster | Coefficients | Estimate | ||
Intercept | -5.7705 | Intercept | -7.6828 | |||
1.1384 | Intercept | -7.6828 | ||||
1.1287 | 0.0818 | 0.6045 | ||||
2.2775 | ||||||
2.3532 | ||||||
0.0953 | 0.5412 |
Variable | Type | Description |
Default | Categorical | Dependent variable: 1 if borrower greater than 180 days past due on monthly installments; 0 otherwise. |
Score | Continuous | A number, prepared by third parties, summarizing the borrower's creditworthiness, which may be indicative of the likelihood that the borrower will timely repay future obligations. |
DTI | Continuous | Original Debt-To-Income Ratio. |
UPB | Continuous | Unpaid Principal Balance. |
LTV | Continuous | Original Loan-To-Value. |
OIR | Continuous | Original Interest Rate. |
Number of Borrowers | Categorical | The number of borrower(s) who are obligated to repay the mortgage note secured by the mortgaged property. 1 = one borrower; 2 = more than one borrower. |
Seller | Categorical | The entity acting in its capacity as a seller of mortgages to Freddie Mac at the time of acquisition. |
Servicer | Categorical | The entity acting in its capacity as the servicer of mortgages to Freddie Mac as of the last period for which loan activity is reported in the Dataset. |
First Time Homebuyer | Categorical | Y =yes; N = no. |
Number of Units | Categorical | Denotes whether the mortgage is a one-, two-, three-, or four-unit property. |
Occupancy Status | Categorical | O = Owner Occupied; I = Investment Property; S = Second Home; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
PPM | Categorical | Denotes whether the mortgage is a Prepayment Penalty Mortgage. Y = PPM; N = Not PPM. |
Property Type | Categorical | CO = Condo; LH = Leasehold; PU = PUD; MH = Manufactured Housing; SF = 1-4 Fee Simple; CP = Co-op; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
Loan Purpose | Categorical | P = Purchase; C = Cash-out Refinance; N = No Cash-out Refinance; Space = Unknown. |
Variable | Type | Description |
Default | Categorical | Dependent variable: 1 if borrower greater than 180 days past due on monthly installments; 0 otherwise. |
Score | Continuous | A number, prepared by third parties, summarizing the borrower's creditworthiness, which may be indicative of the likelihood that the borrower will timely repay future obligations. |
DTI | Continuous | Original Debt-To-Income Ratio. |
UPB | Continuous | Unpaid Principal Balance. |
LTV | Continuous | Original Loan-To-Value. |
OIR | Continuous | Original Interest Rate. |
Number of Borrowers | Categorical | The number of borrower(s) who are obligated to repay the mortgage note secured by the mortgaged property. 1 = one borrower; 2 = more than one borrower. |
Seller | Categorical | The entity acting in its capacity as a seller of mortgages to Freddie Mac at the time of acquisition. |
Servicer | Categorical | The entity acting in its capacity as the servicer of mortgages to Freddie Mac as of the last period for which loan activity is reported in the Dataset. |
First Time Homebuyer | Categorical | Y =yes; N = no. |
Number of Units | Categorical | Denotes whether the mortgage is a one-, two-, three-, or four-unit property. |
Occupancy Status | Categorical | O = Owner Occupied; I = Investment Property; S = Second Home; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
PPM | Categorical | Denotes whether the mortgage is a Prepayment Penalty Mortgage. Y = PPM; N = Not PPM. |
Property Type | Categorical | CO = Condo; LH = Leasehold; PU = PUD; MH = Manufactured Housing; SF = 1-4 Fee Simple; CP = Co-op; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
Loan Purpose | Categorical | P = Purchase; C = Cash-out Refinance; N = No Cash-out Refinance; Space = Unknown. |
Training set year | 2000 | 2001 | |
Default collection year | 2001 2002 | 2002 2003 | |
Testing set year | 2003 | 2004 |
Training set year | 2000 | 2001 | |
Default collection year | 2001 2002 | 2002 2003 | |
Testing set year | 2003 | 2004 |
Time | With Relabelling | Without Relabelling | ||||||
AUC | DeLong | Bootstrap | Stratified | AUC | DeLong | Bootstrap | Stratified | |
2003 Q1 | 0.879 | 0.033 | 0.035 | 0.032 | 0.873 | 0.032 | 0.028 | 0.033 |
2003 Q2 | 0.880 | 0.025 | 0.024 | 0.024 | 0.878 | 0.026 | 0.026 | 0.025 |
2003 Q3 | 0.839 | 0.035 | 0.033 | 0.031 | 0.824 | 0.039 | 0.037 | 0.038 |
2003 Q4 | 0.872 | 0.025 | 0.025 | 0.025 | 0.872 | 0.026 | 0.028 | 0.026 |
2004 Q1 | 0.808 | 0.042 | 0.041 | 0.041 | 0.804 | 0.043 | 0.041 | 0.039 |
2004 Q2 | 0.804 | 0.053 | 0.056 | 0.053 | 0.795 | 0.052 | 0.046 | 0.050 |
2004 Q3 | 0.636 | 0.067 | 0.063 | 0.067 | 0.634 | 0.075 | 0.067 | 0.073 |
2004 Q4 | 0.806 | 0.046 | 0.045 | 0.046 | 0.796 | 0.054 | 0.056 | 0.051 |
2005 Q1 | 0.865 | 0.025 | 0.024 | 0.027 | 0.805 | 0.042 | 0.045 | 0.043 |
2005 Q2 | 0.841 | 0.026 | 0.025 | 0.026 | 0.758 | 0.038 | 0.037 | 0.036 |
2005 Q3 | 0.849 | 0.021 | 0.020 | 0.022 | 0.799 | 0.033 | 0.032 | 0.033 |
2005 Q4 | 0.814 | 0.022 | 0.022 | 0.021 | 0.776 | 0.027 | 0.028 | 0.029 |
2006 Q1 | 0.817 | 0.017 | 0.016 | 0.016 | 0.797 | 0.020 | 0.021 | 0.019 |
2006 Q2 | 0.803 | 0.015 | 0.016 | 0.016 | 0.795 | 0.017 | 0.017 | 0.017 |
2006 Q3 | 0.789 | 0.016 | 0.015 | 0.015 | 0.776 | 0.018 | 0.018 | 0.018 |
2006 Q4 | 0.776 | 0.012 | 0.012 | 0.012 | 0.769 | 0.013 | 0.013 | 0.013 |
2007 Q1 | 0.697 | 0.013 | 0.013 | 0.014 | 0.713 | 0.013 | 0.012 | 0.012 |
2007 Q2 | 0.704 | 0.010 | 0.010 | 0.010 | 0.720 | 0.009 | 0.009 | 0.009 |
2007 Q3 | 0.725 | 0.008 | 0.008 | 0.008 | 0.727 | 0.008 | 0.008 | 0.008 |
2007 Q4 | 0.720 | 0.006 | 0.006 | 0.007 | 0.738 | 0.006 | 0.006 | 0.005 |
2008 Q1 | 0.837 | 0.004 | 0.004 | 0.004 | 0.838 | 0.004 | 0.005 | 0.005 |
2008 Q2 | 0.832 | 0.005 | 0.005 | 0.005 | 0.833 | 0.005 | 0.006 | 0.005 |
2008 Q3 | 0.830 | 0.006 | 0.006 | 0.007 | 0.831 | 0.006 | 0.006 | 0.007 |
2008 Q4 | 0.857 | 0.008 | 0.008 | 0.008 | 0.856 | 0.008 | 0.008 | 0.008 |
2009 Q1 | 0.804 | 0.024 | 0.023 | 0.022 | 0.805 | 0.024 | 0.023 | 0.023 |
2009 Q2 | 0.811 | 0.018 | 0.019 | 0.017 | 0.807 | 0.018 | 0.017 | 0.018 |
2009 Q3 | 0.757 | 0.013 | 0.013 | 0.013 | 0.758 | 0.013 | 0.012 | 0.013 |
2009 Q4 | 0.738 | 0.023 | 0.025 | 0.022 | 0.742 | 0.023 | 0.022 | 0.023 |
2010 Q1 | 0.825 | 0.033 | 0.034 | 0.032 | 0.829 | 0.032 | 0.029 | 0.031 |
2010 Q2 | 0.793 | 0.038 | 0.039 | 0.037 | 0.798 | 0.037 | 0.034 | 0.039 |
2010 Q3 | 0.826 | 0.034 | 0.031 | 0.034 | 0.830 | 0.033 | 0.029 | 0.033 |
2010 Q4 | 0.769 | 0.036 | 0.038 | 0.034 | 0.779 | 0.037 | 0.035 | 0.037 |
2011 Q1 | 0.789 | 0.039 | 0.037 | 0.035 | 0.780 | 0.039 | 0.043 | 0.039 |
2011 Q2 | 0.780 | 0.042 | 0.041 | 0.039 | 0.773 | 0.043 | 0.041 | 0.042 |
2011 Q3 | 0.740 | 0.048 | 0.048 | 0.044 | 0.733 | 0.049 | 0.048 | 0.046 |
2011 Q4 | 0.782 | 0.050 | 0.043 | 0.047 | 0.783 | 0.049 | 0.050 | 0.046 |
2012 Q1 | 0.861 | 0.034 | 0.032 | 0.033 | 0.868 | 0.031 | 0.031 | 0.031 |
2012 Q2 | 0.776 | 0.043 | 0.045 | 0.038 | 0.778 | 0.042 | 0.046 | 0.039 |
2012 Q3 | 0.771 | 0.045 | 0.043 | 0.045 | 0.784 | 0.045 | 0.046 | 0.043 |
2012 Q4 | 0.771 | 0.038 | 0.036 | 0.034 | 0.766 | 0.039 | 0.038 | 0.040 |
2013 Q1 | 0.769 | 0.039 | 0.037 | 0.039 | 0.772 | 0.040 | 0.039 | 0.041 |
2013 Q2 | 0.738 | 0.029 | 0.028 | 0.029 | 0.739 | 0.030 | 0.028 | 0.026 |
2013 Q3 | 0.730 | 0.040 | 0.039 | 0.041 | 0.735 | 0.042 | 0.043 | 0.041 |
2013 Q4 | 0.754 | 0.033 | 0.031 | 0.032 | 0.750 | 0.033 | 0.032 | 0.032 |
Time | With Relabelling | Without Relabelling | ||||||
AUC | DeLong | Bootstrap | Stratified | AUC | DeLong | Bootstrap | Stratified | |
2003 Q1 | 0.879 | 0.033 | 0.035 | 0.032 | 0.873 | 0.032 | 0.028 | 0.033 |
2003 Q2 | 0.880 | 0.025 | 0.024 | 0.024 | 0.878 | 0.026 | 0.026 | 0.025 |
2003 Q3 | 0.839 | 0.035 | 0.033 | 0.031 | 0.824 | 0.039 | 0.037 | 0.038 |
2003 Q4 | 0.872 | 0.025 | 0.025 | 0.025 | 0.872 | 0.026 | 0.028 | 0.026 |
2004 Q1 | 0.808 | 0.042 | 0.041 | 0.041 | 0.804 | 0.043 | 0.041 | 0.039 |
2004 Q2 | 0.804 | 0.053 | 0.056 | 0.053 | 0.795 | 0.052 | 0.046 | 0.050 |
2004 Q3 | 0.636 | 0.067 | 0.063 | 0.067 | 0.634 | 0.075 | 0.067 | 0.073 |
2004 Q4 | 0.806 | 0.046 | 0.045 | 0.046 | 0.796 | 0.054 | 0.056 | 0.051 |
2005 Q1 | 0.865 | 0.025 | 0.024 | 0.027 | 0.805 | 0.042 | 0.045 | 0.043 |
2005 Q2 | 0.841 | 0.026 | 0.025 | 0.026 | 0.758 | 0.038 | 0.037 | 0.036 |
2005 Q3 | 0.849 | 0.021 | 0.020 | 0.022 | 0.799 | 0.033 | 0.032 | 0.033 |
2005 Q4 | 0.814 | 0.022 | 0.022 | 0.021 | 0.776 | 0.027 | 0.028 | 0.029 |
2006 Q1 | 0.817 | 0.017 | 0.016 | 0.016 | 0.797 | 0.020 | 0.021 | 0.019 |
2006 Q2 | 0.803 | 0.015 | 0.016 | 0.016 | 0.795 | 0.017 | 0.017 | 0.017 |
2006 Q3 | 0.789 | 0.016 | 0.015 | 0.015 | 0.776 | 0.018 | 0.018 | 0.018 |
2006 Q4 | 0.776 | 0.012 | 0.012 | 0.012 | 0.769 | 0.013 | 0.013 | 0.013 |
2007 Q1 | 0.697 | 0.013 | 0.013 | 0.014 | 0.713 | 0.013 | 0.012 | 0.012 |
2007 Q2 | 0.704 | 0.010 | 0.010 | 0.010 | 0.720 | 0.009 | 0.009 | 0.009 |
2007 Q3 | 0.725 | 0.008 | 0.008 | 0.008 | 0.727 | 0.008 | 0.008 | 0.008 |
2007 Q4 | 0.720 | 0.006 | 0.006 | 0.007 | 0.738 | 0.006 | 0.006 | 0.005 |
2008 Q1 | 0.837 | 0.004 | 0.004 | 0.004 | 0.838 | 0.004 | 0.005 | 0.005 |
2008 Q2 | 0.832 | 0.005 | 0.005 | 0.005 | 0.833 | 0.005 | 0.006 | 0.005 |
2008 Q3 | 0.830 | 0.006 | 0.006 | 0.007 | 0.831 | 0.006 | 0.006 | 0.007 |
2008 Q4 | 0.857 | 0.008 | 0.008 | 0.008 | 0.856 | 0.008 | 0.008 | 0.008 |
2009 Q1 | 0.804 | 0.024 | 0.023 | 0.022 | 0.805 | 0.024 | 0.023 | 0.023 |
2009 Q2 | 0.811 | 0.018 | 0.019 | 0.017 | 0.807 | 0.018 | 0.017 | 0.018 |
2009 Q3 | 0.757 | 0.013 | 0.013 | 0.013 | 0.758 | 0.013 | 0.012 | 0.013 |
2009 Q4 | 0.738 | 0.023 | 0.025 | 0.022 | 0.742 | 0.023 | 0.022 | 0.023 |
2010 Q1 | 0.825 | 0.033 | 0.034 | 0.032 | 0.829 | 0.032 | 0.029 | 0.031 |
2010 Q2 | 0.793 | 0.038 | 0.039 | 0.037 | 0.798 | 0.037 | 0.034 | 0.039 |
2010 Q3 | 0.826 | 0.034 | 0.031 | 0.034 | 0.830 | 0.033 | 0.029 | 0.033 |
2010 Q4 | 0.769 | 0.036 | 0.038 | 0.034 | 0.779 | 0.037 | 0.035 | 0.037 |
2011 Q1 | 0.789 | 0.039 | 0.037 | 0.035 | 0.780 | 0.039 | 0.043 | 0.039 |
2011 Q2 | 0.780 | 0.042 | 0.041 | 0.039 | 0.773 | 0.043 | 0.041 | 0.042 |
2011 Q3 | 0.740 | 0.048 | 0.048 | 0.044 | 0.733 | 0.049 | 0.048 | 0.046 |
2011 Q4 | 0.782 | 0.050 | 0.043 | 0.047 | 0.783 | 0.049 | 0.050 | 0.046 |
2012 Q1 | 0.861 | 0.034 | 0.032 | 0.033 | 0.868 | 0.031 | 0.031 | 0.031 |
2012 Q2 | 0.776 | 0.043 | 0.045 | 0.038 | 0.778 | 0.042 | 0.046 | 0.039 |
2012 Q3 | 0.771 | 0.045 | 0.043 | 0.045 | 0.784 | 0.045 | 0.046 | 0.043 |
2012 Q4 | 0.771 | 0.038 | 0.036 | 0.034 | 0.766 | 0.039 | 0.038 | 0.040 |
2013 Q1 | 0.769 | 0.039 | 0.037 | 0.039 | 0.772 | 0.040 | 0.039 | 0.041 |
2013 Q2 | 0.738 | 0.029 | 0.028 | 0.029 | 0.739 | 0.030 | 0.028 | 0.026 |
2013 Q3 | 0.730 | 0.040 | 0.039 | 0.041 | 0.735 | 0.042 | 0.043 | 0.041 |
2013 Q4 | 0.754 | 0.033 | 0.031 | 0.032 | 0.750 | 0.033 | 0.032 | 0.032 |
train year | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 |
test year | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 |
without relabelling | 0.435 | 0.885 | 0.980 | 1.000 | 1.000 | 0.800 |
with relabelling | 0.420 | 0.679 | 0.398 | 0.842 | 0.855 | 0.289 |
train year | 2006 | 2007 | 2008 | 2009 | 2010 | |
test year | 2009 | 2010 | 2011 | 2012 | 2013 | |
without relabelling | 0.993 | 0.900 | 0.985 | 0.890 | 0.990 | |
with relabelling | 0.930 | 0.930 | 0.983 | 0.827 | 0.795 |
train year | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 |
test year | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 |
without relabelling | 0.435 | 0.885 | 0.980 | 1.000 | 1.000 | 0.800 |
with relabelling | 0.420 | 0.679 | 0.398 | 0.842 | 0.855 | 0.289 |
train year | 2006 | 2007 | 2008 | 2009 | 2010 | |
test year | 2009 | 2010 | 2011 | 2012 | 2013 | |
without relabelling | 0.993 | 0.900 | 0.985 | 0.890 | 0.990 | |
with relabelling | 0.930 | 0.930 | 0.983 | 0.827 | 0.795 |
Train year | Training Default rate | Test year | Test Default rate | AUC difference |
2000 | 0.41% | 2003 | 0.06% | 0.0057 |
2001 | 0.20% | 2004 | 0.07% | 0.0063 |
2002 | 0.10% | 2005 | 0.18% | 0.0578 |
2003 | 0.06% | 2006 | 0.89% | 0.0119 |
2004 | 0.07% | 2007 | 4.26% | -0.0133 |
2005 | 0.18% | 2008 | 3.15% | -0.0005 |
2006 | 0.89% | 2009 | 0.30% | -0.0003 |
2007 | 4.26% | 2010 | 0.09% | -0.0055 |
2008 | 3.15% | 2011 | 0.08% | 0.0055 |
2009 | 0.30% | 2012 | 0.06% | -0.0041 |
2010 | 0.09% | 2013 | 0.10% | -0.0011 |
Train year | Training Default rate | Test year | Test Default rate | AUC difference |
2000 | 0.41% | 2003 | 0.06% | 0.0057 |
2001 | 0.20% | 2004 | 0.07% | 0.0063 |
2002 | 0.10% | 2005 | 0.18% | 0.0578 |
2003 | 0.06% | 2006 | 0.89% | 0.0119 |
2004 | 0.07% | 2007 | 4.26% | -0.0133 |
2005 | 0.18% | 2008 | 3.15% | -0.0005 |
2006 | 0.89% | 2009 | 0.30% | -0.0003 |
2007 | 4.26% | 2010 | 0.09% | -0.0055 |
2008 | 3.15% | 2011 | 0.08% | 0.0055 |
2009 | 0.30% | 2012 | 0.06% | -0.0041 |
2010 | 0.09% | 2013 | 0.10% | -0.0011 |
[1] |
Yuyuan Ouyang, Trevor Squires. Some worst-case datasets of deterministic first-order methods for solving binary logistic regression. Inverse Problems and Imaging, 2021, 15 (1) : 63-77. doi: 10.3934/ipi.2020047 |
[2] |
Lican Kang, Yuan Luo, Jerry Zhijian Yang, Chang Zhu. A primal and dual active set algorithm for truncated $L_1$ regularized logistic regression. Journal of Industrial and Management Optimization, 2022 doi: 10.3934/jimo.2022050 |
[3] |
Wenbin Lv, Qingyuan Wang. Global existence for a class of Keller-Segel models with signal-dependent motility and general logistic term. Evolution Equations and Control Theory, 2021, 10 (1) : 25-36. doi: 10.3934/eect.2020040 |
[4] |
Alexander Quaas, Aliang Xia. Existence and uniqueness of positive solutions for a class of logistic type elliptic equations in $\mathbb{R}^N$ involving fractional Laplacian. Discrete and Continuous Dynamical Systems, 2017, 37 (5) : 2653-2668. doi: 10.3934/dcds.2017113 |
[5] |
Yuri Kogan, Zvia Agur, Moran Elishmereni. A mathematical model for the immunotherapeutic control of the Th1/Th2 imbalance in melanoma. Discrete and Continuous Dynamical Systems - B, 2013, 18 (4) : 1017-1030. doi: 10.3934/dcdsb.2013.18.1017 |
[6] |
Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178 |
[7] |
Adil Bagirov, Sona Taheri, Soodabeh Asadi. A difference of convex optimization algorithm for piecewise linear regression. Journal of Industrial and Management Optimization, 2019, 15 (2) : 909-932. doi: 10.3934/jimo.2018077 |
[8] |
Shaoyong Lai, Qichang Xie. A selection problem for a constrained linear regression model. Journal of Industrial and Management Optimization, 2008, 4 (4) : 757-766. doi: 10.3934/jimo.2008.4.757 |
[9] |
Juan J. Nieto, M. Victoria Otero-Espinar, Rosana Rodríguez-López. Dynamics of the fuzzy logistic family. Discrete and Continuous Dynamical Systems - B, 2010, 14 (2) : 699-717. doi: 10.3934/dcdsb.2010.14.699 |
[10] |
Luis Caffarelli, Serena Dipierro, Enrico Valdinoci. A logistic equation with nonlocal interactions. Kinetic and Related Models, 2017, 10 (1) : 141-170. doi: 10.3934/krm.2017006 |
[11] |
Zenonas Navickas, Rasa Smidtaite, Alfonsas Vainoras, Minvydas Ragulskis. The logistic map of matrices. Discrete and Continuous Dynamical Systems - B, 2011, 16 (3) : 927-944. doi: 10.3934/dcdsb.2011.16.927 |
[12] |
Roberto De Leo, James A. Yorke. The graph of the logistic map is a tower. Discrete and Continuous Dynamical Systems, 2021, 41 (11) : 5243-5269. doi: 10.3934/dcds.2021075 |
[13] |
Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031 |
[14] |
Bingzheng Li, Zhengzhan Dai. Error analysis on regularized regression based on the Maximum correntropy criterion. Mathematical Foundations of Computing, 2020, 3 (1) : 25-40. doi: 10.3934/mfc.2020003 |
[15] |
Song Wang, Quanxi Shao, Xian Zhou. Knot-optimizing spline networks (KOSNETS) for nonparametric regression. Journal of Industrial and Management Optimization, 2008, 4 (1) : 33-52. doi: 10.3934/jimo.2008.4.33 |
[16] |
Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure and Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176 |
[17] |
Erik Kropat, Gerhard Wilhelm Weber. Fuzzy target-environment networks and fuzzy-regression approaches. Numerical Algebra, Control and Optimization, 2018, 8 (2) : 135-155. doi: 10.3934/naco.2018008 |
[18] |
Wei Li, Yun Teng. Enterprise inefficient investment behavior analysis based on regression analysis. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1015-1025. doi: 10.3934/dcdss.2019069 |
[19] |
Yang Mi, Kang Zheng, Song Wang. Homography estimation along short videos by recurrent convolutional regression network. Mathematical Foundations of Computing, 2020, 3 (2) : 125-140. doi: 10.3934/mfc.2020014 |
[20] |
Qing Xu, Xiaohua (Michael) Xuan. Nonlinear regression without i.i.d. assumption. Probability, Uncertainty and Quantitative Risk, 2019, 4 (0) : 8-. doi: 10.1186/s41546-019-0042-6 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]