# American Institute of Mathematical Sciences

• Previous Article
The $F$-objective function method for differentiable interval-valued vector optimization problems
• JIMO Home
• This Issue
• Next Article
Hadamard directional differentiability of the optimal value of a linear second-order conic programming problem
doi: 10.3934/jimo.2020128

## Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models

 Faculty of Mathematics, Statistics and Computer Science, Semnan University, P.O. Box 35195–363, Semnan, Iran

* Corresponding author: Mahdi Roozbeh

Received  September 2019 Revised  May 2020 Published  August 2020

In classical regression analysis, the ordinary least–squares estimation is the best strategy when the essential assumptions such as normality and independency to the error terms as well as ignorable multicollinearity in the covariates are met. However, if one of these assumptions is violated, then the results may be misleading. Especially, outliers violate the assumption of normally distributed residuals in the least–squares regression. In this situation, robust estimators are widely used because of their lack of sensitivity to outlying data points. Multicollinearity is another common problem in multiple regression models with inappropriate effects on the least–squares estimators. So, it is of great importance to use the estimation methods provided to tackle the mentioned problems. As known, robust regressions are among the popular methods for analyzing the data that are contaminated with outliers. In this guideline, here we suggest two mixed–integer nonlinear optimization models which their solutions can be considered as appropriate estimators when the outliers and multicollinearity simultaneously appear in the data set. Capable to be effectively solved by metaheuristic algorithms, the models are designed based on penalization schemes with the ability of down–weighting or ignoring unusual data and multicollinearity effects. We establish that our models are computationally advantageous in the perspective of the flop count. We also deal with a robust ridge methodology. Finally, three real data sets are analyzed to examine performance of the proposed methods.

Citation: Mahdi Roozbeh, Saman Babaie–Kafaki, Zohre Aminifard. Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models. Journal of Industrial & Management Optimization, doi: 10.3934/jimo.2020128
##### References:

show all references

##### References:
The diagnostic plots of the model (18)
The diagram of ${\rm GCV}(k,z)$ versus the ridge parameter for the bridge projects data set
The diagnostic plots for the model (20)
The diagram of ${\rm GCV}(k,z)$ versus the ridge parameter for the electricity data
The diagnostic plots for the model (21)
The diagram of ${\rm GCV}(k,z)$ versus the ridge parameter for the CPS data
Evaluation of the proposed estimators for the bridge projects data set
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 2.3317 1.91363 2.0304 1.8278 $\log(CCost)$ 0.1483 0.33718 0.3056 0.2923 $\log(Dwgs)$ 0.8356 0.58002 0.6210 0.7829 $\log(Spans)$ 0.1963 0.06662 0.0657 0.0241 ${\rm SSE}$ 3.8692 1.9788 1.9778 1.0577 ${\rm R}^2$ 0.7747 0.8579 0.8600 0.9147 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 1.9140 -0.0125 - -7.8431 $\log(CCost)$ 0.2360 0.4152 - 0.4236 $\log(Dwgs)$ 0.8914 0.3933 - 2.8061 $\log(Spans)$ 0.0467 0.1176 - 0.5110 ${\rm SSE}$ 1.1504 4.0131 2.7834 1.7108 ${\rm R}^2$ 0.9020 0.7663 0.8379 0.9004
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 2.3317 1.91363 2.0304 1.8278 $\log(CCost)$ 0.1483 0.33718 0.3056 0.2923 $\log(Dwgs)$ 0.8356 0.58002 0.6210 0.7829 $\log(Spans)$ 0.1963 0.06662 0.0657 0.0241 ${\rm SSE}$ 3.8692 1.9788 1.9778 1.0577 ${\rm R}^2$ 0.7747 0.8579 0.8600 0.9147 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 1.9140 -0.0125 - -7.8431 $\log(CCost)$ 0.2360 0.4152 - 0.4236 $\log(Dwgs)$ 0.8914 0.3933 - 2.8061 $\log(Spans)$ 0.0467 0.1176 - 0.5110 ${\rm SSE}$ 1.1504 4.0131 2.7834 1.7108 ${\rm R}^2$ 0.9020 0.7663 0.8379 0.9004
The most effective subgroup of predictor variables based on the ${\rm R}^2_{adj}$ and AIC criteria for the electricity data set
 Subset size Predictor variables ${\rm R}^2_{adj}$ AIC 1 $Temp$ 0.5523 -1067.814 2 $Temp,LREG$ 0.5781 -1077.339 3 ${\bf Temp,LREG,LI}$ 0.5892 -1081.063 4 $Temp,LREG,LI,x_{9}$ 0.5891 -1080.057 5 $Temp,LREG,LI,x_{9},x_{10}$ 0.5882 -1078.709 6 $Temp,LREG,LI,x_{9},x_{10},x_{11}$ 0.5875 -1077.427 7 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1}$ 0.5858 -1075.734 8 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3}$ 0.5837 -1073.897 9 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5}$ 0.5812 -1071.907 10 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4}$ 0.5789 -1069.987 11 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7}$ 0.5764 -1067.997 12 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2}$ 0.5740 -1064.098 13 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6}$ 0.5718 -1064.281 14 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6},x_{8}$ 0.5709 -1063.014
 Subset size Predictor variables ${\rm R}^2_{adj}$ AIC 1 $Temp$ 0.5523 -1067.814 2 $Temp,LREG$ 0.5781 -1077.339 3 ${\bf Temp,LREG,LI}$ 0.5892 -1081.063 4 $Temp,LREG,LI,x_{9}$ 0.5891 -1080.057 5 $Temp,LREG,LI,x_{9},x_{10}$ 0.5882 -1078.709 6 $Temp,LREG,LI,x_{9},x_{10},x_{11}$ 0.5875 -1077.427 7 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1}$ 0.5858 -1075.734 8 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3}$ 0.5837 -1073.897 9 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5}$ 0.5812 -1071.907 10 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4}$ 0.5789 -1069.987 11 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7}$ 0.5764 -1067.997 12 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2}$ 0.5740 -1064.098 13 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6}$ 0.5718 -1064.281 14 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6},x_{8}$ 0.5709 -1063.014
Evaluation of the proposed estimators for the electricity data set
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 4.4069 5.1693 4.9881 5.2039 $LI$ 0.1925 0.0989 0.1146 0.0956 $LREG$ -0.0778 -0.0939 -0.1054 -0.0956 $Temp$ -0.0002 -0.0002 -0.0003 -0.0003 ${\rm SSE}$ 0.3765 0.2637 0.1982 0.1296 ${\rm R}^2$ 0.5962 0.6742 0.7399 0.7559 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 4.0907 0.0881 - 2.6215 $LI$ 0.2225 0.1545 - 1.2806 $LREG$ -0.0940 -0.1322 - -3.7418 $Temp$ -0.0003 -0.7508 - -0.8067 ${\rm SSE}$ 0.1413 0.3881 0.2629 0.4240 ${\rm R}^2$ 0.7468 0.5838 0.7181 0.5452
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 4.4069 5.1693 4.9881 5.2039 $LI$ 0.1925 0.0989 0.1146 0.0956 $LREG$ -0.0778 -0.0939 -0.1054 -0.0956 $Temp$ -0.0002 -0.0002 -0.0003 -0.0003 ${\rm SSE}$ 0.3765 0.2637 0.1982 0.1296 ${\rm R}^2$ 0.5962 0.6742 0.7399 0.7559 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 4.0907 0.0881 - 2.6215 $LI$ 0.2225 0.1545 - 1.2806 $LREG$ -0.0940 -0.1322 - -3.7418 $Temp$ -0.0003 -0.7508 - -0.8067 ${\rm SSE}$ 0.1413 0.3881 0.2629 0.4240 ${\rm R}^2$ 0.7468 0.5838 0.7181 0.5452
Evaluation of the proposed estimators for the CPS data
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 1.0786 0.7498 1.1963 0.9257 $education$ 0.1794 0.1482 0.2576 0.2018 $south$ -0.1024 -0.1208 -0.1109 -0.1174 $sex$ -0.2220 -0.2851 -0.2776 -0.2665 $experience$ 0.0958 0.0613 0.1630 0.1090 $union$ 0.2005 0.1939 0.1987 0.1427 $age$ -0.0854 -0.0473 -0.1510 -0.0960 $race$ 0.0504 0.0674 0.0482 0.0749 $occupation$ -0.0074 -0.0122 0.0072 -0.0126 $sector$ 0.0915 0.0614 0.0411 0.0965 $married$ 0.0766 0.0590 0.1937 0.0924 ${\rm SSE}$ 101.17 76.3827 50.5810 49.8101 ${\rm R}^2$ 0.3185 0.4049 0.4146 0.4123 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 0.9038 0.0054 - -5.5913 $education$ 0.1974 0.4997 - 0.6978 $south$ -0.0916 -0.1141 - -0.4331 $sex$ -0.2416 -0.2638 - -0.9731 $experience$ 0.1011 0.2573 - 0.2991 $union$ 0.1791 0.1511 - 1.0483 $age$ -0.0888 0.0420 - -0.2590 $race$ 0.0515 0.0930 - 0.2437 $occupation$ -0.0140 -0.0526 - 0.0004 $sector$ 0.0810 0.0918 - 0.3258 $married$ 0.1216 0.0524 - 0.4156 ${\rm SSE}$ 49.2827 102.5847 79.0911 84.2234 ${\rm R}^2$ 0.4279 0.3089 0.4672 0.4326
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 1.0786 0.7498 1.1963 0.9257 $education$ 0.1794 0.1482 0.2576 0.2018 $south$ -0.1024 -0.1208 -0.1109 -0.1174 $sex$ -0.2220 -0.2851 -0.2776 -0.2665 $experience$ 0.0958 0.0613 0.1630 0.1090 $union$ 0.2005 0.1939 0.1987 0.1427 $age$ -0.0854 -0.0473 -0.1510 -0.0960 $race$ 0.0504 0.0674 0.0482 0.0749 $occupation$ -0.0074 -0.0122 0.0072 -0.0126 $sector$ 0.0915 0.0614 0.0411 0.0965 $married$ 0.0766 0.0590 0.1937 0.0924 ${\rm SSE}$ 101.17 76.3827 50.5810 49.8101 ${\rm R}^2$ 0.3185 0.4049 0.4146 0.4123 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 0.9038 0.0054 - -5.5913 $education$ 0.1974 0.4997 - 0.6978 $south$ -0.0916 -0.1141 - -0.4331 $sex$ -0.2416 -0.2638 - -0.9731 $experience$ 0.1011 0.2573 - 0.2991 $union$ 0.1791 0.1511 - 1.0483 $age$ -0.0888 0.0420 - -0.2590 $race$ 0.0515 0.0930 - 0.2437 $occupation$ -0.0140 -0.0526 - 0.0004 $sector$ 0.0810 0.0918 - 0.3258 $married$ 0.1216 0.0524 - 0.4156 ${\rm SSE}$ 49.2827 102.5847 79.0911 84.2234 ${\rm R}^2$ 0.4279 0.3089 0.4672 0.4326
 [1] Demetres D. Kouvatsos, Jumma S. Alanazi, Kevin Smith. A unified ME algorithm for arbitrary open QNMs with mixed blocking mechanisms. Numerical Algebra, Control & Optimization, 2011, 1 (4) : 781-816. doi: 10.3934/naco.2011.1.781 [2] Kazeem Olalekan Aremu, Chinedu Izuchukwu, Grace Nnenanya Ogwo, Oluwatosin Temitope Mewomo. Multi-step iterative algorithm for minimization and fixed point problems in p-uniformly convex metric spaces. Journal of Industrial & Management Optimization, 2021, 17 (4) : 2161-2180. doi: 10.3934/jimo.2020063 [3] Tadeusz Kaczorek, Andrzej Ruszewski. Analysis of the fractional descriptor discrete-time linear systems by the use of the shuffle algorithm. Journal of Computational Dynamics, 2021  doi: 10.3934/jcd.2021007 [4] Carlos Fresneda-Portillo, Sergey E. Mikhailov. Analysis of Boundary-Domain Integral Equations to the mixed BVP for a compressible stokes system with variable viscosity. Communications on Pure & Applied Analysis, 2019, 18 (6) : 3059-3088. doi: 10.3934/cpaa.2019137 [5] Ugo Bessi. Another point of view on Kusuoka's measure. Discrete & Continuous Dynamical Systems, 2021, 41 (7) : 3241-3271. doi: 10.3934/dcds.2020404 [6] Hsin-Lun Li. Mixed Hegselmann-Krause dynamics. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021084 [7] J. Frédéric Bonnans, Justina Gianatti, Francisco J. Silva. On the convergence of the Sakawa-Shindo algorithm in stochastic control. Mathematical Control & Related Fields, 2016, 6 (3) : 391-406. doi: 10.3934/mcrf.2016008 [8] Ardeshir Ahmadi, Hamed Davari-Ardakani. A multistage stochastic programming framework for cardinality constrained portfolio optimization. Numerical Algebra, Control & Optimization, 2017, 7 (3) : 359-377. doi: 10.3934/naco.2017023 [9] Luke Finlay, Vladimir Gaitsgory, Ivan Lebedev. Linear programming solutions of periodic optimization problems: approximation of the optimal control. Journal of Industrial & Management Optimization, 2007, 3 (2) : 399-413. doi: 10.3934/jimo.2007.3.399 [10] Mohammed Abdelghany, Amr B. Eltawil, Zakaria Yahia, Kazuhide Nakata. A hybrid variable neighbourhood search and dynamic programming approach for the nurse rostering problem. Journal of Industrial & Management Optimization, 2021, 17 (4) : 2051-2072. doi: 10.3934/jimo.2020058 [11] Vakhtang Putkaradze, Stuart Rogers. Numerical simulations of a rolling ball robot actuated by internal point masses. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 143-207. doi: 10.3934/naco.2020021 [12] Wided Kechiche. Global attractor for a nonlinear Schrödinger equation with a nonlinearity concentrated in one point. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021031 [13] Ashkan Ayough, Farbod Farhadi, Mostafa Zandieh, Parisa Rastkhadiv. Genetic algorithm for obstacle location-allocation problems with customer priorities. Journal of Industrial & Management Optimization, 2021, 17 (4) : 1753-1769. doi: 10.3934/jimo.2020044 [14] Jianli Xiang, Guozheng Yan. The uniqueness of the inverse elastic wave scattering problem based on the mixed reciprocity relation. Inverse Problems & Imaging, 2021, 15 (3) : 539-554. doi: 10.3934/ipi.2021004 [15] Vladimir Gaitsgory, Ilya Shvartsman. Linear programming estimates for Cesàro and Abel limits of optimal values in optimal control problems. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021102 [16] Jan Prüss, Laurent Pujo-Menjouet, G.F. Webb, Rico Zacher. Analysis of a model for the dynamics of prions. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 225-235. doi: 10.3934/dcdsb.2006.6.225 [17] Bouthaina Abdelhedi, Hatem Zaag. Single point blow-up and final profile for a perturbed nonlinear heat equation with a gradient and a non-local term. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021032 [18] Qing Liu, Bingo Wing-Kuen Ling, Qingyun Dai, Qing Miao, Caixia Liu. Optimal maximally decimated M-channel mirrored paraunitary linear phase FIR filter bank design via norm relaxed sequential quadratic programming. Journal of Industrial & Management Optimization, 2021, 17 (4) : 1993-2011. doi: 10.3934/jimo.2020055 [19] Sohana Jahan. Discriminant analysis of regularized multidimensional scaling. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 255-267. doi: 10.3934/naco.2020024 [20] Zheng Chang, Haoxun Chen, Farouk Yalaoui, Bo Dai. Adaptive large neighborhood search Algorithm for route planning of freight buses with pickup and delivery. Journal of Industrial & Management Optimization, 2021, 17 (4) : 1771-1793. doi: 10.3934/jimo.2020045

2019 Impact Factor: 1.366