Article Contents
Article Contents

# Smoothing Newton method for $\ell^0$-$\ell^2$ regularized linear inverse problem

• Sparse regression plays a very important role in statistics, machine learning, image and signal processing. In this paper, we consider a high-dimensional linear inverse problem with $\ell^0$-$\ell^2$ penalty to stably reconstruct the sparse signals. Based on the sufficient and necessary condition of the coordinate-wise minimizers, we design a smoothing Newton method with continuation strategy on the regularization parameter. We prove the global convergence of the proposed algorithm. Several numerical examples are provided, and the comparisons with the state-of-the-art algorithms verify the effectiveness and superiority of the proposed method.

Mathematics Subject Classification: Primary: 65F22; Secondary: 90C99.

 Citation:

• Figure 1.  From left to right: the hard thresholding operator $T_{2}(x)$ and corresponding approximations with uniform smoothing function for different $\epsilon$

Figure 2.  The density function

Figure 3.  The limiting form

Figure 4.  Results of random Gaussian matrix with a correlation coefficient $\tilde{\nu} = 0.2$ for strict sparse signal with $n = 1000, p = 2000, \delta = 1e-1, \lambda = 6.3e-01$

Figure 5.  Results of random Gaussian matrix with a correlation coefficient $\tilde{\nu} = 0.2$ for non-strict sparse signal with $n = 1000, p = 2000, \delta = 1e-1, \lambda = 2.1e-01$

Figure 6.  Results of Heaviside matrix for strict sparse signal with $n = 1000, p = 1000, \delta = 1e-1, \lambda = 2.3e-01$

Figure 7.  Results of Heaviside matrix for non-strict sparse signal with $n = 1000, p = 1000, \delta = 1e-1, \lambda = 1.2e-02$

Figure 8.  Results of inverse Laplace matrix for strict sparse signal with $n = 1000, p = 1000, \delta = 1e-3, \lambda = 1.4e-09$

Figure 9.  Results of inverse Laplace matrix for non-strict sparse signal with $n = 1000, p = 1000, \delta = 1e-4, \lambda = 7.1e-11$

Figure 10.  Results of Toeplitz matrix with $n = 1000, p = 1000, \delta = 1e-1, \lambda = 4.5e-01$

Figure 11.  Results of partial Toeplitz matrix with $n = 500, p = 1000, \delta = 1e-1, \lambda = 4.6e-01$

Figure 12.  The exact recovery probability of the support of SNM, LASSO, MCP and SCAD

Figure 13.  The exact recovery probability of the support of SNM, AIHT and FoBa for the same model (5)

Table 1.  The results of relative error $RelErr$ of all the solvers considered here

 Methods Gaussian matrix ($0.3$) Gaussian matrix ($0.8$) Heaviside matrix SNM 2.7e-07 (7.1e-08) 3.2e-07 (1.1e-07) 1.8e-06 (8.5e-07) AIHT 1.4e-03 (9.8e-03) 1.1e-01 (1.2e-01) 1.3e-01 (9.0e-02) FoBa 2.1e-07 (4.4e-08) 1.3e-01 (1.8e-01) 9.0e-07 (3.1e-07) LASSO 3.1e-02 (1.3e-03) 4.9e-02 (3.4e-02) 1.7e-01 (7.1e-02) MCP 2.1e-07 (4.4e-08) 1.1e-01 (1.3e-01) 1.4e-01 (3.5e-01) SCAD 2.1e-07 (4.4e-08) 8.9e-03 (5.2e-02) 9.0e-07 (3.1e-07)
•  [1] H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19 (1974), 716-723.  doi: 10.1109/tac.1974.1100705. [2] A. Beck, First-Order Methods in Optimization, MOS-SIAM Series on Optimization, 25. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Optimization Society, Philadelphia, PA, 2017. doi: 10.1137/1.9781611974997.ch1. [3] A. Beck and M. Teboulle, Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems, IEEE Transactions on Image Processing, 18 (2009), 2419-2434.  doi: 10.1109/TIP.2009.2028250. [4] D. Bertsimas, A. King and R. Mazumder, Best subset selection via a modern optimization lens, The Annals of Statistics, 44 (2016), 813-852.  doi: 10.1214/15-AOS1388. [5] S. Billups, S. Dirkse and M. Ferris, A comparison of large scale mixed complementarity problem solvers, Computational Optimization and Applications, 7 (1997), 3-25.  doi: 10.1023/A:1008632215341. [6] T. Blumensath, Accelerated iterative hard thresholding, Signal Processing, 92 (2012), 752-756.  doi: 10.1016/j.sigpro.2011.09.017. [7] T. Blumensath and M. Davies, Iterative thresholding for sparse approximations, Journal of Fourier Analysis and Applications, 14 (2008), 629-654.  doi: 10.1007/s00041-008-9035-z. [8] S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 3 (2010), 1-122.  doi: 10.1561/9781601984616. [9] K. Bredies and D. Lorenz, Regularization with non-convex separable constraints, Inverse Problems, 25 (2009), 085011. doi: 10.1088/0266-5611/25/8/085011. [10] T. Cai and A. Zhang, Sharp RIP bound for sparse signal and low-rank matrix recovery, Applied and Computational Harmonic Analysis, 35 (2013), 74-93.  doi: 10.1016/j.acha.2012.07.010. [11] E. Candes and T. Tao, Decoding by linear programming, IEEE Transactions on Information Theory, 51 (2005), 4203-4215.  doi: 10.1109/TIT.2005.858979. [12] R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Processing Letters, 14 (2007), 707-710. [13] B. Chen and P. Harker, Smooth approximations to nonlinear complementarity problems, SIAM Journal on Optimization, 7 (1997), 403-420.  doi: 10.1137/S1052623495280615. [14] C. Chen and O. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Computational Optimization and Applications, 5 (1996), 97-138.  doi: 10.1007/BF00249052. [15] L. Chen and Y. Gu, The convergence guarantees of a non-convex approach for sparse recovery, IEEE Transactions on Signal Processing, 62 (2014), 3754-3767.  doi: 10.1109/TSP.2014.2330349. [16] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit, SIAM Review, 43 (2001), 129-159.  doi: 10.1137/S003614450037906X. [17] X. Chen, Smoothing methods for nonsmooth, nonconvex minimization, Mathematical Programming, 134 (2012), 71-99.  doi: 10.1007/s10107-012-0569-0. [18] X. Chen and L. Qi, A parameterized Newton method and a quasi-Newton method for nonsmooth equations, Computational Optimization and Applications, 3 (1994), 157-179.  doi: 10.1007/BF01300972. [19] X. Chen, L. Qi and D. Sun, Global and superlinear convergence of the smoothing Newton method and its application to general box constrained variational inequalities, Mathematics of Computation of the American Mathematical Society, 67 (1998), 519-540.  doi: 10.1090/S0025-5718-98-00932-6. [20] X. Chen and T. Yamamoto, Convergence domains of certain iterative methods for solving nonlinear equations, Numericacl Functional Analysis and Optimization, 10 (1989), 37-48.  doi: 10.1080/01630568908816289. [21] I. Daubechies, M. Defrise and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics, 57 (2004), 1413-1457.  doi: 10.1002/cpa.20042. [22] D. Donoho, Compressed sensing, IEEE Transactions on Information Theory, 52 (2006), 1289-1306. [23] D. Donoho and X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Transactions on Information Theory, 47 (2001), 2845-2862.  doi: 10.1109/18.959265. [24] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96 (2001), 1348-1360.  doi: 10.1198/016214501753382273. [25] Q. Fan, Y. Jiao and X. Lu, A primal dual active set algorithm with continuation for compressed sensing, IEEE Transactions on Signal Processing, 62 (2014), 6276-6285.  doi: 10.1109/TSP.2014.2362880. [26] M. Figueiredo, R. Nowak and S. Wright, Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing, 1 (2007), 586-597.  doi: 10.1109/JSTSP.2007.910281. [27] I. Frank and J. Friedman, A statistical view of some chemometrics regression tools, Technometrics, 35 (1993), 109-135. [28] S. Gabriel and J. Moré, Smoothing of mixed complementarity problems, Complementarity and Variational Problems: State of the Art, (1997), 105–116. [29] G. Gasso, A. Rakotomamonjy and S. Canu, Recovering sparse signals with a certain family of nonconvex penalties and DC programming, IEEE Transactions on Signal Processing, 57 (2009), 4686-4698. [30] M. Grasmair, O. Scherzer and M. Haltmeier, Necessary and sufficient conditions for linear convergence of $\ell^1$-regularization, Communications on Pure and Applied Mathematics, 64 (2011), 161-182.  doi: 10.1002/cpa.20350. [31] R. Griesse and D. Lorenz, A semismooth Newton method for Tikhonov functionals with sparsity constraints, Inverse Problems, 24 (2008), 035007. doi: 10.1088/0266-5611/24/3/035007. [32] E. Hans and T. Raasch, Global convergence of damped semismooth Newton methods for $l_1$ Tikhonov regularization, Inverse Problems, 31 (2015), 025005. doi: 10.1088/0266-5611/31/2/025005. [33] M. Hintermüller and T. Wu, A superlinearly convergent R-regularized Newton scheme for variational models with concave sparsity-promoting priors, Computational Optimization and Applications, 57 (2014), 1-25.  doi: 10.1007/s10589-013-9583-2. [34] R. Horn and  C. Johnson,  Matrix Analysis, Cambridge university press, 2013. [35] J. Huang, P. Breheny, S. Lee, S. Ma and C. Zhang, The Mnet method for variable selection, Statistica Sinica, 26 (2016), 903-923. [36] J. Huang, Y. Jiao, B. Jin, J. Liu, X. Lu and C. Yang, A unified primal dual active set algorithm for nonconvex sparse recovery, Statistical Science, 36 (2021), 215-238.  doi: 10.1214/19-sts758. [37] J. Huang, Y. Jiao, Y. Liu and X. Lu, A constructive approach to $L_0$ penalized regression, Journal of Machine Learning Research, 19 (2018), 403-439. [38] Z. Huang, L. Qi and D. Sun, Sub-quadratic convergence of a smoothing Newton algorithm for the $P_0$-and monotone LCP, Mathematical Programming, 99 (2004), 423-441.  doi: 10.1007/s10107-003-0457-8. [39] K. Ito and K. Kunisch, Optimal control with $L^p(\Omega), p\in [0, 1)$, control cost, SIAM Journal on Control and Optimization, 52 (2014), 1251-1275.  doi: 10.1137/120896529. [40] Y. Jiao, B. Jin and X. Lu, A primal dual active set with continuation algorithm for the $\ell_0$-regularized optimization problem, Applied and Computational Harmonic Analysis, 39 (2015), 400-426.  doi: 10.1016/j.acha.2014.10.001. [41] Y. Jiao, Q. Jin, X. Lu and W. Wang, Alternating direction method of multipliers for linear inverse problems, SIAM Journal on Numerical Analysis, 54 (2016), 2114-2137.  doi: 10.1137/15M1029308. [42] Y. Jiao, Q. Jin, X. Lu and W. Wang, Preconditioned alternating direction method of multipliers for inverse problems with constraints, Inverse Problems, 33 (2017), 025004. doi: 10.1088/1361-6420/33/2/025004. [43] B. Jin, D. Lorenz and S. Schiffler, Elastic-net regularization: Error estimates and active set methods, Inverse Problems, 25 (2009), 115022. doi: 10.1088/0266-5611/25/11/115022. [44] B. Kummer, Newton's method for non-differentiable functions, Advances in mathematical optimization, 45 (1988), 114-125. [45] X. Li, D. Sun and K. Toh, A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems, SIAM Journal on Optimization, 28 (2018), 433-458.  doi: 10.1137/16M1097572. [46] R. Mazumder, J. Friedman and T. Hastie, SparseNet: Coordinate descent with nonconvex penalties, Journal of the American Statistical Association, 106 (2011), 1125-1138.  doi: 10.1198/jasa.2011.tm09738. [47] M. Osborne, B. Presnell and B. Turlach, A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20 (2000), 389-403.  doi: 10.1093/imanum/20.3.389. [48] L. Qi and X. Chen, A globally convergent successive approximation method for severely nonsmooth equations, SIAM Journal on Control and Optimization, 33 (1995), 402-418.  doi: 10.1137/S036301299223619X. [49] L. Qi and D. Sun, Smoothing functions and smoothing Newton method for complementarity and variational inequality problems, Journal of Optimization Theory and Applications, 113 (2002), 121-147.  doi: 10.1023/A:1014861331301. [50] R. Ramlau and G. Teschke, A Tikhonov-based projection iteration for nonlinear ill-posed problems with sparsity constraints, Numerische Mathematik, 104 (2006), 177-203.  doi: 10.1007/s00211-006-0016-3. [51] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 58 (1996), 267-288.  doi: 10.1111/j.2517-6161.1996.tb02080.x. [52] P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, Journal of Optimization Theory and Applications, 109 (2001), 475-494.  doi: 10.1023/A:1017501703105. [53] L. Wang, Q. Zhao, J. Gao, Z. Xu, M. Fehler and X. Jiang, Seismic sparse-spike deconvolution via Toeplitz-sparse matrix factorization, Geophysics, 81 (2016), 169-182.  doi: 10.1190/geo2015-0151.1. [54] W. Wang, S. Lu, H. Mao and J. Cheng, Multi-parameter Tikhonov regularization with the $\ell^0$ sparsity constraint, Inverse Problems, 29 (2013), 065018. doi: 10.1088/0266-5611/29/6/065018. [55] C. Yi and J. Huang, Semismooth newton coordinate descent algorithm for elastic-net penalized huber loss regression and quantile regression, Journal of Computational and Graphical Statistics, 26 (2017), 547-557.  doi: 10.1080/10618600.2016.1256816. [56] C. Zhang, Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38 (2010), 894-942.  doi: 10.1214/09-AOS729. [57] C. H. Zhang and T. Zhang, A general theory of concave regularization for high-dimensional sparse estimation problems, Statistical Science, 27 (2012), 576-593.  doi: 10.1214/12-STS399. [58] T. Zhang, Adaptive forward-backward greedy algorithm for learning sparse representations, IEEE Transactions on Information Theory, 57 (2011), 4689-4708.  doi: 10.1109/TIT.2011.2146690. [59] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67 (2005), 301-320.  doi: 10.1111/j.1467-9868.2005.00503.x.

Figures(13)

Tables(1)