Predicted Positive | Predicted Negative | |
Positive | TP | FN |
Negative | FP | TN |
Recently, Synthetic Minority Over-Sampling Technique (SMOTE) has been widely used to handle the imbalanced classification. To address the issues of existing benchmark methods, we propose a novel scheme of SMOTE based on the K-means and Intuitionistic Fuzzy Set theory to assign proper weights to the existing points and generate new synthetic points from them. Besides, we introduce the state-of-the-art kernel-free fuzzy quadratic surface support vector machine (QSSVM) to do the classification. Finally, the numerical experiments on various artificial and real data sets strongly demonstrate the validity and applicability of our proposed method, especially in the presence of mislabeled information.
Citation: |
Figure 1. (a) The original distribution of the data set. (b) The synthetic examples generated by SMOTE (k = 5). (c) The synthetic examples generated by K-means. (d) The data set with mislabeled information (red dot). (e) synthetic examples generated by SMOTE (k = 5). (f) The synthetic examples generated by K-means
Table 1. Confusion matrix for a two-class problem
Predicted Positive | Predicted Negative | |
Positive | TP | FN |
Negative | FP | TN |
Table 2. Results of RUS-SVM, ROS-SVM, SMOTE-SVM, borderline-SMOTE1-SVM, borderline-SMOTE2-SVM, MWMOTE-SVM, WSMOTE-QSSVM on quadratic artificial datasets with RBF kernel
Data set | Methods | $ G-mean $ | std of $ G-mean $ | $ AUC $ | std of $ AUC $ | Time(s) |
AR1:800$ \times $3 IR 1:3 |
RUS-SVM | 0.5414 | 0.3175 | 0.3838 | 0.3258 | $\textbf{2.7148}$ |
ROS-SVM | 0.8886 | 0.0446 | 0.7914 | 0.0783 | 4.9408 | |
SMOTE-SVM | 0.6453 | 0.1181 | 0.429 | 0.1421 | 8.752 | |
borderline-SMOTE1-SVM | 0.846 | 0.0652 | 0.7195 | 0.1095 | 5.7484 | |
borderline-SMOTE2-SVM | 0.595 | 0.1774 | 0.3823 | 0.2126 | 6.378 | |
MWMOTE-SVM | 0.6982 | 0.0526 | 0.49 | 0.072 | 10.9427 | |
WSMOTE-QSSVM | $\textbf{0.9051}$ | $\textbf{0.0363}$ | $\textbf{0.8203}$ | $\textbf{0.0642}$ | 4.0263 | |
AR1:2200$ \times $3 IR 1:10 |
RUS-SVM | 0.516 | 0.2291 | 0.3135 | 0.203 | $\textbf{13.6355}$ |
ROS-SVM | 0.8508 | 0.0604 | 0.7271 | 0.0985 | 52.8877 | |
SMOTE-SVM | 0.4311 | 0.1005 | 0.1949 | 0.0874 | 127.6243 | |
borderline-SMOTE1-SVM | 0.7897 | 0.0479 | 0.6257 | 0.0764 | 104.4459 | |
borderline-SMOTE2-SVM | 0.5014 | 0.1418 | 0.2695 | 0.1347 | 86.6192 | |
MWMOTE-SVM | 0.5971 | 0.0421 | 0.3581 | $\textbf{0.0491}$ | 93.0159 | |
WSMOTE-QSSVM | $\textbf{0.8917}$ | $\textbf{0.0301}$ | $\textbf{0.796}$ | 0.054 | 23.2509 | |
AR2:800$ \times $3 IR 1:3 |
RUS-SVM | 0.7449 | 0.2905 | 0.6309 | 0.3688 | $\textbf{3.5008}$ |
ROS-SVM | 0.9295 | $\textbf{0.0337}$ | 0.865 | $\textbf{0.0621}$ | 6.7627 | |
SMOTE-SVM | 0.6407 | 0.1499 | 0.486 | 0.4307 | 10.6513 | |
borderline-SMOTE1-SVM | 0.8778 | 0.0671 | 0.8062 | 0.7745 | 8.3038 | |
borderline-SMOTE2-SVM | 0.6218 | 0.1317 | 0.5273 | 0.4022 | 8.1676 | |
MWMOTE-SVM | 0.7123 | 0.0414 | 0.5626 | 0.5089 | 11.0469 | |
WSMOTE-QSSVM | $\textbf{0.9337}$ | 0.0345 | $\textbf{0.8729}$ | 0.0638 | 4.1132 | |
AR2:2200$ \times $3 IR 1:10 |
RUS-SVM | 0.4802 | 0.2185 | 0.5771 | 0.2159 | $\textbf{13.4868}$ |
ROS-SVM | 0.8855 | 0.0222 | 0.8799 | 0.039 | 53.3882 | |
SMOTE-SVM | 0.4396 | 0.1028 | 0.47996 | 0.0845 | 130.0079 | |
borderline-SMOTE1-SVM | 0.871 | $\textbf{0.0185}$ | 0.8628 | $\textbf{0.0324}$ | 101.8846 | |
borderline-SMOTE2-SVM | 0.5126 | 0.1133 | 0.5578 | 0.1097 | 108.3143 | |
MWMOTE-SVM | 0.6314 | 0.0463 | 0.6176 | 0.0587 | 90.1415 | |
WSMOTE-QSSVM | $\textbf{0.9005}$ | 0.0355 | $\textbf{0.8960}$ | 0.0639 | 23.926 | |
AR3:800$ \times $3 IR 1:3 |
RUS-SVM | 0.5859 | 0.3457 | 0.4509 | 0.3265 | $\textbf{3.3678}$ |
ROS-SVM | 0.8978 | $\textbf{0.024}$ | 0.8065 | $\textbf{0.0431}$ | 6.8082 | |
SMOTE-SVM | 0.6942 | 0.0527 | 0.4844 | 0.0716 | 10.9369 | |
borderline-SMOTE1-SVM | 0.8735 | 0.061 | 0.7663 | 0.1038 | 8.8757 | |
borderline-SMOTE2-SVM | 0.6095 | 0.1267 | 0.3859 | 0.164 | 8.0711 | |
MWMOTE-SVM | 0.6893 | 0.0478 | 0.4772 | 0.0664 | 11.2833 | |
WSMOTE-QSSVM | $\textbf{0.9148}$ | 0.041 | $\textbf{0.8384}$ | 0.0744 | 3.7026 | |
AR3:2200$ \times $3 IR 1:10 |
RUS-SVM | 0.5809 | 0.2743 | 0.4052 | 0.2942 | $\textbf{12.913}$ |
ROS-SVM | 0.8788 | 0.055 | 0.775 | 0.0965 | 50.4581 | |
SMOTE-SVM | 0.4874 | 0.0875 | 0.2445 | 0.0847 | 125.7192 | |
borderline-SMOTE1-SVM | 0.8586 | $\textbf{0.0284}$ | 0.7379 | 0.0485 | 113.3755 | |
borderline-SMOTE2-SVM | 0.5892 | 0.0387 | 0.3485 | $\textbf{0.046}$ | 82.4204 | |
MWMOTE-SVM | 0.6045 | 0.0525 | 0.3679 | 0.0621 | 90.1298 | |
WSMOTE-QSSVM | $\textbf{0.8825}$ | 0.051 | $\textbf{0.7811}$ | 0.0896 | 22.8422 |
Table 3.
Average
Mislabeled level(%) | Indicator | RUS-SVM | ROS-SVM | SMOTE-SVM | borderline-SMOTE1-SVM | borderline-SMOTE2-SVM | MWMOTE-SVM | WSMOTE-QSSVM |
10 | $ AUC $ | 0.8552 | 0.8325 | 0.573 | 0.8716 | 0.602 | 0.652 | $\textbf{0.8729}$ |
$ G-Mean $ | 0.9245 | 0.912 | 0.7457 | 0.9332 | 0.767 | 0.803 | $\textbf{0.9342}$ | |
15 | $ AUC $ | $\textbf{0.898}$ | 0.8343 | 0.6416 | 0.8272 | 0.4717 | 0.6065 | 0.8489 |
$ G-Mean $ | $\textbf{0.9474}$ | 0.9127 | 0.8005 | 0.9091 | 0.6707 | 0.7781 | 0.921 | |
20 | $ AUC $ | 0.4509 | 0.8065 | 0.4844 | 0.7663 | 0.3859 | 0.4772 | $\textbf{0.8384}$ |
$ G-Mean $ | 0.5859 | 0.8978 | 0.6942 | 0.8735 | 0.6095 | 0.6893 | $\textbf{0.9148}$ | |
25 | $ AUC $ | 0.2005 | 0.7544 | 0.3944 | 0.6254 | 0.3238 | 0.4363 | $\textbf{0.7892}$ |
$ G-Mean $ | 0.2835 | 0.8681 | 0.6213 | 0.7874 | 0.5615 | 0.6595 | $\textbf{0.8876}$ | |
30 | $ AUC $ | 0.0314 | 0.7754 | 0.4056 | 0.5259 | 0.1855 | 0.3721 | $\textbf{0.7758}$ |
$ G-Mean $ | 0.1076 | 0.8787 | 0.6346 | 0.7229 | 0.4216 | 0.6067 | $\textbf{0.8804}$ |
Table 4. Detailed information about KEEL data sets
Datasets | Data size | Features | Imbalanced ratio |
Pima | 214 | 9 | 1.82 |
Glass4 | 214 | 10 | 1.82 |
Glass5 | 214 | 10 | 1.82 |
Haberman | 306 | 4 | 2.78 |
Vehicle1 | 846 | 19 | 2.9 |
Glass0123vs456 | 214 | 10 | 3.2 |
Abalone21vs8 | 581 | 9 | 40.5 |
Vowel0 | 988 | 14 | 9.98 |
Shuttlec0vsc4 | 1829 | 10 | 13.87 |
Pageblocks13vs4 | 472 | 11 | 15.86 |
Glass016vs5 | 184 | 10 | 19.44 |
Abalone918 | 731 | 9 | 16.4 |
Newthyroid2 | 215 | 6 | 5.14 |
Yeast4 | 1484 | 9 | 28.1 |
Yeast6 | 1484 | 9 | 41.4 |
Table 5.
Datasets Name | RUS-SVM mean/std/rank time(s) | ROS-SVM mean/std/rank time(s) | SMOTE-SVM mean/std/rank time(s) | borderline-SMOTE1-SVM mean/std/rank time(s) | borderline-SMOTE2-SVM mean/std/rank time(s) | MWMOTE-SVM mean/std /rank time(s) | WSMOTE-QSSVM mean/std/rank time(s) |
Pima | 0.2726 /0.1996/7 | 0.6292/0.0446/2 | 0.5802/0.0802/4 | 0.6211/0.0628/3 | 0.3459/0.1442/6 | 0.5681/0.0529/5 | 0.6521/0.0505/1 |
16.3335 | 8.2276 | 16.6804 | 18.3543 | 18.381 | 18.414 | 11.0034 | |
Vowel0 | 0.4675/0.3726/7 | 0.918/0.0845/2 | 0.8054/0.042/4 | 0.8536/0.0759/3 | 0.7976/0.0671/5 | 0.7594/0.0529/6 | 0.9199/0.0583/1 |
14.1906 | 4.3252 | 33.0654 | 31.7508 | 31.5846 | 25.7455 | 25.7865 | |
Glass0123vs456 | 0.7099/0.1468/6 | 0.7874/0.0861/4 | 0.8037/0.0786/2 | 0.8291/0.0837/1 | 0.7418/0.1339/5 | 0.7048/0.1415/7 | 0.7921/0.0892/3 |
0.6951 | 0.3932 | 1.0297 | 1.3301 | 1.2478 | 1.1803 | 3.056 | |
Haberman | 0.3932/0.0794/5 | 0.2286/0.1846/7 | 0.4457/0.0944/4 | 0.4687/0.1048/3 | 0.3468/0.1334/6 | 0.5168/0.0544/2 | 0.6394/0.1057/1 |
1.0481 | 0.5925 | 78.2541 | 1.7105 | 22.0941 | 2.8829 | 1.2026 | |
Vehicle1 | 0.2926/0.2722/6 | 0.6873/0.1027/2 | 0.38/0.1016/5 | 0.61/0.0604/3 | 0.2131/0.144/7 | 0.479/0.0467/4 | 0.7254/0.041/1 |
14.1348 | 7.3443 | 23.71 | 23.79 | 24.2277 | 20.6886 | 47.0032 | |
Shuttlec0vsc4 | 0.7955/0.1095/6 | 0.9939/0.0098/1 | 0.7633/0.0287/7 | 0.8513/0.045/5 | 0.8527/0.0454/4 | 0.9238/0.0273/3 | 0.9819/0.0131/2 |
15.0486 | 75.75541 | 150.7936 | 145.2165 | 147.0763 | 78.9499 | 125.5474 | |
Pageblocks13vs4 | 0.6375/0.1509/3 | 0.5887/0.2618/5 | 0.608/0.1809/4 | 0.8848/0.0649/2 | 0.5607/0.2308/7 | 0.5771/0.1332/6 | 0.9081/0.0699/1 |
1.8912 | 5.4061 | 11.0931 | 9.9519 | 10.4541 | 20.2431 | 22.9759 | |
Abalone21vs8 | 0.3995/0.2119/3 | 0.3773/0.4158/4 | 0.1151/0.247/7 | 0.5252/0.3741/2 | 0.2321/0.2448/5 | 0.1937/0.2501/6 | 0.6873/0.2725/1 |
1.0082 | 3.846 | 9.8971 | 8.5385 | 8.5024 | 8.7712 | 6.9418 | |
Yeast6 | 0.3806/0.2218/5 | 0.3719/0.2875/6 | 0.4615/0.1965/4 | 0.5946/0.1959/3 | 0.3188/0.2381/7 | 0.6378/0.1015/2 | 0.7146/0.1159/1 |
4.6745 | 27.0141 | 60.5981 | 60.6621 | 62.1266 | 40.8791 | 20.2801 | |
Abalone918 | 0.3443/0.2325/3 | 0.1556/0.2053/6 | 0.2063/0.2299/5 | 0.3263/0.252/4 | 0.0348/0.1102/7 | 0.3947/0.1589/2 | 0.6533/0.0978/1 |
1.6203 | 6.1747 | 14.5602 | 10.559 | 5.9857 | 13.8744 | 8.594 | |
Newthyroid2 | 0.6419/0.1922/5 | 0.7352/0.1259/4 | 0.5555/0.2352/6 | 0.8474/0.0705/2 | 0.555/0.2077/7 | 0.7676/0.1135/3 | 0.8774/0.096/1 |
0.3228 | 0.6046 | 1.1284 | 0.8065 | 0.7775 | 0.8403 | 1.4356 | |
Glass016vs5 | 0.5164/0.2742/3 | 0.2746/0.3546/7 | 0.5129/0.2897/4 | 0.6011/0.3381/2 | 0.4386/0.389/5 | 0.3389/0.3664/6 | 0.7132/0.1589/1 |
0.1869 | 0.5241 | 1.175 | 1.1612 | 1.1598 | 0.5781 | 3.3859 | |
Glass5 | 0.5388/0.2548/2 | 0.4234/0.3644/7 | 0.4735/0.3416/6 | 0.5215/0.3736/3 | 0.4737/0.3272/5 | 0.4896/0.35/4 | 0.8439/0.1328/1 |
0.2077 | 0.6566 | 1.5191 | 1.3504 | 1.355 | 0.778 | 4.0237 | |
Yeast4 | 0.1843/0.0748/6 | 0.2065/0.2272/5 | 0.3751/0.2332/4 | 0.5555/0.0802/3 | 0.0627/0.1323/7 | 0.6059/0.1153/1 | 0.5941/0.1145/2 |
9.0832 | 45.4406 | 161.9439 | 78.493 | 81.4676 | 78.5899 | 35.4481 | |
Glass4 | 0.431/0.2546/3 | 0.1147/0.2419/7 | 0.4076/0.3698/4 | 0.6286/0.2719/2 | 0.2663/0.3521/6 | 0.378/0.2713/5 | 0.7867/0.1263/1 |
0.2045 | 0.7664 | 1.4998 | 1.3387 | 1.3477 | 1.4321 | 3.4769 | |
Average Rank | 4.7 | 4.6 | 4.7 | 2.7 | 5.9 | 4.1 | 1.3 |
Final Rank | 6 | 4 | 5 | 2 | 7 | 3 | 1 |
Table 6.
Datasets Name | RUS-SVM mean/std/rank time(s) | ROS-SVM mean/std/rank time(s) | SMOTE-SVM mean/std/rank time(s) | borderline-SMOTE1-SVM mean/std /rank time(s) | borderline-SMOTE2-SVM mean/std/rank time(s) | MWMOTE-SVM mean/std/rank time(s) | WSMOTE-QSSVM mean/std/rank time(s) |
Pima | 0.1102/0.0913/7 | 0.3977/0.0534/2 | 0.3424/0.0928/4 | 0.3893/0.074/3 | 0.1384/0.0965/6 | 0.3252/0.0609/5 | $\textbf{0.4275/0.0646/1}$ |
16.3335 | 8.2276 | 16.6804 | 18.3543 | 18.381 | 18.414 | $\textbf{11.0034}$ | |
Vowel0 | 0.3435/0.3226/7 | 0.8491/0.1441/2 | 0.6502/0.0677/4 | 0.7338/0.1277/3 | 0.6402/0.1017/5 | 0.5792/0.0777/6 | $\textbf{0.8493/0.1051/1}$ |
14.1906 | $\textbf{4.3252}$ | 33.0654 | 31.7508 | 31.5846 | 25.7455 | 25.7865 | |
Glass0123vs456 | 0.5233/0.204/6 | 0.6267/0.1316/4 | 0.6515/0.1289/2 | $\textbf{0.6936/0.139/1}$ | 0.5664/0.1878/5 | 0.5148/0.1721/7 | 0.6345/0.1353/3 |
0.6951 | $\textbf{0.3932}$ | 1.0297 | 1.3301 | 1.2478 | 1.1803 | 3.056 | |
Haberman | 0.1603/0.0604/5 | 0.0829/0.0931/7 | 0.2067/0.0798/4 | 0.2296/0.0854/3 | 0.1363/0.0655/6 | 0.2697/0.0573/2 | $\textbf{0.4189/0.1375/1}$ |
1.0481 | $\textbf{0.5925}$ | 78.2541 | 1.7105 | 22.0941 | 2.8829 | 1.2026 | |
Vehicle1 | 0.1523/0.1936/6 | 0.4819/0.1337/2 | 0.1537/0.0884/5 | 0.3753/0.0741/3 | 0.0641/0.0603/7 | 0.2314/0.0439/4 | $\textbf{0.5276/0.0588/1}$ |
14.1348 | $\textbf{7.3443}$ | 23.71 | 23.79 | 24.2277 | 20.6886 | 47.0032 | |
Shuttlec0vsc4 | 0.6436/0.1837/6 | $\textbf{0.988/0.0193/1}$ | 0.5834/0.044/7 | 0.7265/0.0761/5 | 0.729/0.0775/4 | 0.8541/0.0516/3 | 0.9643/0.0257/2 |
$\textbf{15.0486}$ | 75.75541 | 150.7936 | 145.2165 | 147.0763 | 78.9499 | 125.5474 | |
Pageblocks13vs4 | 0.427/0.1861/3 | 0.4082/0.263/4 | 0.3991/0.2052/5 | 0.7867/0.1124/2 | 0.3624/0.1935/6 | 0.3491/0.155/7 | $\textbf{0.829/0.1209/1}$ |
$\textbf{1.8912}$ | 5.4061 | 11.0931 | 9.9519 | 10.4541 | 20.2431 | 22.9759 | |
Abalone21vs8 | 0.2/0.1464/5 | 0.2979/0.3654/3 | 0.0681/0.1533/7 | 0.4018/0.3065/2 | 0.2526/0.2665/4 | 0.0938/0.1212/6 | $\textbf{0.5392/0.2631/1}$ |
$\textbf{1.0082}$ | 3.846 | 9.8971 | 8.5385 | 8.5024 | 8.7712 | 6.9418 | |
Yeast6 | 0.1892/0.1707/6 | 0.2127/0.2034/5 | 0.2477/0.144/4 | 0.3881/0.232/3 | 0.1527/0.1385/7 | 0.4161/0.1298/2 | $\textbf{0.5228/0.1558/1}$ |
$\textbf{4.6745}$ | 27.0141 | 60.5981 | 60.6621 | 62.1266 | 40.8791 | 20.2801 | |
Abalone918 | 0.1672/0.1431/3 | 0.0621/0.0878/6 | 0.0901/0.1128/5 | 0.1637/0.1563/4 | 0.0121/0.0384/7 | 0.1785/0.09/2 | $\textbf{0.4354/0.1282/1}$ |
$\textbf{1.6203}$ | 6.1747 | 14.5602 | 10.559 | 5.9857 | 13.8744 | 8.594 | |
Newthyroid2 | 0.4452/0.2622/5 | 0.5548/0.1858/4 | 0.3583/0.2065/6 | 0.7226/0.1202/2 | 0.3468/0.1504/7 | 0.6008/0.1692/3 | $\textbf{0.7782/0.1661/1}$ |
$\textbf{0.3228}$ | 0.6046 | 1.1284 | 0.8065 | 0.7775 | 0.8403 | 1.4356 | |
Glass016vs5 | 0.3343/0.2469/4 | 0.1886/0.2437/7 | 0.3386/0.2297/3 | 0.4643/0.3076/2 | 0.3286/0.3228/5 | 0.2357/0.2752/6 | $\textbf{0.5314/0.2463/1}$ |
$\textbf{0.1869}$ | 0.5241 | 1.175 | 1.1612 | 1.1598 | 0.5781 | 3.3859 | |
Glass5 | 0.3488/0.2479/4 | 0.2988/0.2572/7 | 0.3293/0.2687/5 | 0.3976/0.3177/2 | 0.3207/0.2221/6 | 0.35/0.2774/3 | $\textbf{0.728/0.2101/1}$ |
$\textbf{0.2077}$ | 0.6566 | 1.5191 | 1.3504 | 1.355 | 0.778 | 4.0237 | |
Yeast4 | 0.039/0.0305/6 | 0.0891/0.1087/5 | 0.1896/0.1526/4 | 0.3144/0.0918/3 | 0.0197/0.0415/7 | $\textbf{0.3791/0.1258/1}$ | 0.3647/0.1404/2 |
$\textbf{9.0832}$ | 45.4406 | 161.9439 | 78.493 | 81.4676 | 78.5899 | 35.4481 | |
Glass4 | 0.2442/0.2012/4 | 0.0658/0.1388/7 | 0.2892/0.2938/3 | 0.4617/0.2738/2 | 0.1825/0.2566/6 | 0.2092/0.1679/5 | $\textbf{0.6333/0.1856/1}$ |
$\textbf{0.2045}$ | 0.7664 | 1.4998 | 1.3387 | 1.3477 | 1.4321 | 3.4769 | |
$\textbf{Average Rank}$ | $\textbf{5.13}$ | $\textbf{4.4}$ | $\textbf{4.53}$ | $\textbf{2.67}$ | $\textbf{5.87}$ | $\textbf{4.13}$ | $\textbf{1.27}$ |
$\textbf{Final Rank}$ | $\textbf{6}$ | $\textbf{4}$ | $\textbf{5}$ | $\textbf{2}$ | $\textbf{7}$ | $\textbf{3}$ | $\textbf{1}$ |
Table 7.
Datasets | RUS-SVM | ROS-SVM | SMOTE-SVM | borderline-SMOTE1-SVM | borderline-SMOTE2-SVM | MWMOTE-SVM | WSMOTE-QSSVM |
Pima | 0.1918 | 0.5924 | 0.5728 | 0.6045 | 0.2423 | 0.5608 | $\textbf{0.6347}$ |
Vowel0 | 0.4398 | 0.912 | 0.8038 | 0.8487 | 0.7864 | 0.7563 | $\textbf{0.9181}$ |
Glass0123vs456 | 0.6668 | 0.7676 | 0.7948 | $\textbf{0.8203}$ | 0.713 | 0.6918 | 0.7789 |
Haberman | 0.2898 | 0.1472 | 0.4129 | 0.4181 | 0.2635 | 0.5074 | $\textbf{0.6115}$ |
Vehicle1 | 0.2354 | 0.6584 | 0.2784 | 0.5787 | 0.1197 | 0.4042 | 0.7231 |
Shuttlec0vsc4 | 0.7749 | $\textbf{0.9939}$ | 0.7578 | 0.85 | 0.8503 | 0.9214 | 0.9817 |
Pageblocks13vs4 | 0.5789 | 0.5362 | 0.5798 | 0.8831 | 0.5212 | 0.5547 | $\textbf{0.9063}$ |
Abalone21vs8 | 0.3178 | 0.3591 | 0.1122 | 0.5162 | 0.4873 | 0.1809 | $\textbf{0.677}$ |
Yeast6 | 0.3013 | 0.3116 | 0.4046 | 0.545 | 0.2474 | 0.6243 | $\textbf{0.7041}$ |
Abalone918 | 0.2748 | 0.1066 | 0.1785 | 0.2729 | 0.0221 | 0.3497 | $\textbf{0.6384}$ |
Newthyroid2 | 0.5924 | 0.6984 | 0.5332 | 0.8422 | 0.5072 | 0.7563 | $\textbf{0.8736}$ |
Glass016vs5 | 0.4586 | 0.2613 | 0.5045 | 0.5878 | 0.423 | 0.331 | $\textbf{0.7031}$ |
Glass5 | 0.4732 | 0.3994 | 0.465 | 0.5064 | 0.4526 | 0.479 | $\textbf{0.835}$ |
Yeast4 | 0.0754 | 0.1489 | 0.3238 | 0.5015 | 0.0363 | $\textbf{0.5905}$ | 0.5632 |
Glass4 | 0.3774 | 0.0997 | 0.3971 | 0.6028 | 0.2517 | 0.3611 | $\textbf{0.7768}$ |
[1] |
S. Barua, M. M. Islam, X. Yao and K. Murase, MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning, IEEE Transactions on Knowledge & Data Engineering, 26 (2014), 405-425.
doi: 10.1109/TKDE.2012.232.![]() ![]() |
[2] |
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16 (2002), 321-357.
doi: 10.1613/jair.953.![]() ![]() |
[3] |
E. Duchesnay, A. Cachia, N. Boddaert, N. Chabane, J.-F. Mangin, J.-L. Martinot, F. Brunelle and M. Zilbovicius, Feature selection and classification of imbalanced datasets: Application to PET images of children with autistic spectrum disorders, Neuroimage, 57 (2011), 1003-1014.
doi: 10.1016/j.neuroimage.2011.05.011.![]() ![]() |
[4] |
A. Gelman and D. B. Rubin, Inference from iterative simulation using multiple sequences, Statistical Science, 7 (1992), 457-472.
doi: 10.1214/ss/1177011136.![]() ![]() |
[5] |
R. Y. Goh and L. S. Lee, Credit scoring: A review on support vector machines and metaheuristic approaches, Adv. Oper. Res., 2019 (2019), 1974794, 30pp.
doi: 10.1155/2019/1974794.![]() ![]() ![]() |
[6] |
R. S. Gong and S. H. Huang, A Kolmogorov-Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction, Expert Systems with Applications, 39 (2012), 6192-6200.
doi: 10.1016/j.eswa.2011.12.011.![]() ![]() |
[7] |
M. H. Ha, C. Wang and J. Q. Chen, The support vector machine based on intuitionistic fuzzy number and kernel function, Soft Computing, 17 (2013), 635-641.
doi: 10.1007/s00500-012-0937-y.![]() ![]() |
[8] |
H. Han, W. Y. Wang and B. H. Mao, Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, International Conference on Intelligent Computing, 2005 (3644), 878-887.
doi: 10.1007/11538059_91.![]() ![]() |
[9] |
X. P. Hua, S. Xu, J. Gao and S. F. Ding, L1-norm loss-based projection twin support vector machine for binary classification, Soft Computing, 23 (2019), 10649-10659.
doi: 10.1007/s00500-019-04002-6.![]() ![]() |
[10] |
W. C. Lin, C. F. Tsai, Y. H. Hu and J. S. Jhang, Clustering-based undersampling in class-imbalanced data-ScienceDirect, Information Sciences, 409/410 (2017), 17-26.
doi: 10.1016/j.ins.2017.05.008.![]() ![]() |
[11] |
J. Luo, S. C. Fang, Y. Bai and Z. Deng, Fuzzy quadratic surface support vector machine based on fisher discriminant analysis, J. Ind. Manag. Optim., 12 (2016), 357-373.
![]() ![]() |
[12] |
J. Luo, S. C. Fang, Z. B. Deng and X. L. Guo, Soft quadratic surface support vector machine for binary classification, Asia-Pac. J. Oper. Res., 33 (2016), 1650046, 22 pp.
doi: 10.1142/S0217595916500469.![]() ![]() ![]() |
[13] |
K. Miroslav, H. Robert and M. Stan, Machine learning for the detection of oil spills in satellite radar images, Machine Learning, 30 (1998), 195-215.
![]() |
[14] |
E. Ramentol, Y. Caballero, R. Bello and F. Herrera, SMOTE-RSB: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory, Knowledge and Information Systems, 33 (2012), 245-265.
doi: 10.1007/s10115-011-0465-6.![]() ![]() |
[15] |
E. Ramentol, N. Verbiest, R. Bello, Y. Caballero, C. Cornelts and F. Herrera, SMOTE-FRST: A new resampling method using fuzzy rough set theory, Uncertainty Modeling in Knowledge Engineering and Decision Making, 7 (2012), 800-805.
doi: 10.1142/9789814417747_0128.![]() ![]() |
[16] |
B. SchLkopf, J. C. Platt, J. S. Taylor, A. J. Smola and R. C. Williamson, Estimating the support of a high-dimensional distribution, Neural Computation, 13 (2001), 1443-1471.
doi: 10.1162/089976601750264965.![]() ![]() |
[17] |
J. Taeho and J. Nathalie, Class imbalances versus small disjuncts, Acm Sigkdd Explorations Newsletter, 6 (2004), 40-49.
doi: 10.1145/1007730.1007737.![]() ![]() |
[18] |
M. A. Tahir, J. Kittler, K. Mikolajczyk and F. Yan, A multiple expert approach to the class imbalance problem using inverse random under sampling, International Workshop on Multiple Classifier Systems, 5519 (2009), 82-91.
doi: 10.1007/978-3-642-02326-2_9.![]() ![]() |
[19] |
Y. Tian, Z. B. Deng, J. Luo and Y. Q. Li, An intuitionistic fuzzy set based S$^3$VM model for binary classification with mislabeled information, Fuzzy Optim. Decis. Mak., 17 (2018), 475-494.
doi: 10.1007/s10700-017-9282-z.![]() ![]() ![]() |
[20] |
Y. Tian, M. Sun, Z. B. Deng, J. Luo and Y. Q. Li, A new fuzzy set and non-kernel SVM approach for mislabeled binary classification with applications, IEEE Transactions on Fuzzy Systems, 25 (2017), 1536-1545.
doi: 10.1109/TFUZZ.2017.2752138.![]() ![]() |
[21] |
[0885-6125] J. M. Tomczak and M. Zie.ba, Probabilistic combination of classification rules and its application to medical diagnosis, Mach. Learn., 101 (2015), 105-135.
doi: 10.1007/s10994-015-5508-x.![]() ![]() ![]() |
[22] |
N. Verbiest, E. Ramentol, C. Cornelis and F. Herrera, Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection, Applied Soft Computing, 22 (2014), 511-517.
doi: 10.1016/j.asoc.2014.05.023.![]() ![]() |
[23] |
R. F. Xu, T. Chen, Y. Q. Xia, Q. Lu, B. Liu and X. Wang, Word embedding composition for data imbalances in sentiment and emotion classification, Cognitive Computation, 7 (2015), 226-240.
doi: 10.1007/s12559-015-9319-y.![]() ![]() |
[24] |
T. Yu, J. Debenham, T. Jan and S. Simoff, Combine vector quantization and support vector machine for imbalanced datasets, Artificial Intelligence in Theory and Practice, 217 (2012), 81-88.
doi: 10.1007/978-0-387-34747-9_9.![]() ![]() |
[25] |
Y. T. Xu, Q. Wang, X. Y. Pang and Y. Tian, Maximum margin of twin spheres machine with pinball loss for imbalanced data classification, Applied Intelligence, 48 (2018), 23-34.
doi: 10.1007/s10489-017-0961-9.![]() ![]() |
(a) The original distribution of the data set. (b) The synthetic examples generated by SMOTE (k = 5). (c) The synthetic examples generated by K-means. (d) The data set with mislabeled information (red dot). (e) synthetic examples generated by SMOTE (k = 5). (f) The synthetic examples generated by K-means
(a) The original distribution of the data set. (b) The synthetic minority examples generated by SMOTE (red triangular point). (c) The synthetic minority examples generated by our method (red triangular point)
ROC curves of seven methods over nine data sets: (a)Abalone21vs8 data set (b)Abalone918 data set (c)Haberman data set (d)Glass4 data set (e)Pima data set (f)Glass5 data set (g)Glass016vs5 data set (h)Vehicle1 data set (i)Yeast6 data set