\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Risk estimators for choosing regularization parameters in ill-posed problems - properties and limitations

Abstract Full Text(HTML) Figure(15) / Table(2) Related Papers Cited by
  • This paper discusses the properties of certain risk estimators that recently regained popularity for choosing regularization parameters in ill-posed problems, in particular for sparsity regularization. They apply Stein's unbiased risk estimator (SURE) to estimate the risk in either the space of the unknown variables or in the data space. We will call the latter PSURE in order to distinguish the two different risk functions. It seems intuitive that SURE is more appropriate for ill-posed problems, since the properties in the data space do not tell much about the quality of the reconstruction. We provide theoretical studies of both approaches for linear Tikhonov regularization in a finite dimensional setting and estimate the quality of the risk estimators, which also leads to asymptotic convergence results as the dimension of the problem tends to infinity. Unlike previous works which studied single realizations of image processing problems with a very low degree of ill-posedness, we are interested in the statistical behaviour of the risk estimators for increasing ill-posedness. Interestingly, our theoretical results indicate that the quality of the SURE risk can deteriorate asymptotically for ill-posed problems, which is confirmed by an extensive numerical study. The latter shows that in many cases the SURE estimator leads to extremely small regularization parameters, which obviously cannot stabilize the reconstruction. Similar but less severe issues with respect to robustness also appear for the PSURE estimator, which in comparison to the rather conservative discrepancy principle leads to the conclusion that regularization parameter choice based on unbiased risk estimation is not a reliable procedure for ill-posed problems. A similar numerical study for sparsity regularization demonstrates that the same issue appears in non-linear variational regularization approaches.

    Mathematics Subject Classification: Primary: 65F22, 62F12; Secondary: 35R30, 49N45.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  (a) The convolution kernel $k_l(t)$ for different values of $l$.(b) True solution $x^*$, clean data $A_l x^*$ and noisy data $A_lx^* +\varepsilon$ for $m = n = 64$, $l = 0.06$, $\sigma = 0.1$

    Figure 2.  Decay of the singular values $\gamma_i$ of $A_l$ for different choices of $m = n$ and $l$. As expected, increasing the width $l$ of the convolution kernel leads to a faster decay. For a fixed $l$, increasing $m$ corresponds to using a finer discretization and $\gamma_i$ converges to the corresponding singular value of $A_{\infty, l}$, as can be seen for the largest $\gamma_i$, e.g., for $l = 0.02$

    Figure 3.  Empirical probabilities of (a) $\hat \alpha$ and (b) the corresponding $\ell_2$-error for different parameter choice rules using $m = n = 64$, $l = 0.06$, $\sigma = 0.1$ and $N_\varepsilon = 10^6$ samples of $\varepsilon$

    Figure 4.  Joint empirical probabilities of $\log_{10}\|x^* - x_{\hat \alpha}\|_2$ using $m = n = 64$, $l = 0.06$, $\sigma = 0.1$ and $N_\varepsilon = 10^6$ samples of $\varepsilon$ (the histograms in Figure 3(b) are the marginal distributions thereof). As in Figure 3(b), the logarithms of the probabilities are displayed (here in form of a color-coding) to facilitate the identification of smaller modes and tails. The red line at $x = y$ divides the areas where one method performs better than the other: In (a), all samples falling into the area on the right of the red line correspond to a noise realization where the discrepancy principle leads to a smaller error than PSURE. The percentage of samples for which that is true is 13% for PSURE and 44% for SURE

    Figure 5.  True risk functions (black dotted line), their estimates for six different realizations $y^k$, $k = 1\ldots 6$ (solid lines), and their corresponding minima/roots (dots on the lines) in the setting described in Figure 1 using $\ell_2$-regularization: (a) $\text{DP}(\alpha, Ax^*)$ and $\text{DP}(\alpha, y^k)$. (b) $\text{MSPE}(\alpha)$ and $\text{PSURE}(\alpha, y^k)$. (c) $\text{MSEE}(\alpha)$ and $\text{SURE}(\alpha, y^k)$

    Figure 6.  Illustration of Theorems 3.3 and 3.8 for $\ell_2$-regularization: The left hand side of (22)/(23) was estimated by the sample mean and plotted vs. $m$. For (23), the normalization with $\text{cond}(A)$ was omitted in (b) and included in (c). The black dotted lines were added to compare the order of convergence

    Figure 7.  Empirical probabilities of $\alpha$ for $\ell_2$-regularization and different parameter choice rules for $l = 0.06$ and varying $m$

    Figure 8.  Empirical probabilities of $\log_{10}\left(\tfrac{1}{m} \| x^* - x_{\alpha} \|_2^2\right)$ for $\ell_2$-regularization and different parameter choice rules for $l = 0.06$ and varying $m$

    Figure 9.  Empirical probabilities of $\alpha$ for $\ell_2$-regularization and different parameter choice rules for $m = 64$ and varying $l$

    Figure 10.  Empirical probabilities of $\log_{10}\left(\tfrac{1}{m} \| x^* - x_{\alpha} \|_2^2\right)$ for $\ell_2$-regularization and different parameter choice rules for $m = 64$ and varying $l$

    Figure 11.  Illustration of the difference between evaluating the SURE risk on a coarse, linear grid for $\alpha$ as opposed to a fine, logarithmic one: In (a), a linear grid is constructed around ${{\hat{\alpha }}_{\text{DP}}}$ as $\alpha = \Delta_\alpha, 2 \Delta_\alpha, \ldots, 50\Delta_\alpha$ with $\Delta_\alpha = 2 {{\hat{\alpha }}_{\text{DP}}}/50$. While the plot suggests a clear minimum, (b) reveals that it is only a sub-optimal local minimum and that the linear grid did not cover the essential parts of $\text{SURE}(\alpha, y)$. (c) and (d) show the same plots for a different noise realization. Here, a linear grid will not even find a clear minimum. Both risk estimators are the same as those plotted in Figure 5(c) with the same colors

    Figure 12.  Risk functions (black dotted line), $k = 1, \ldots, 6$ estimates thereof (solid lines) and their corresponding minima/roots (dots on the lines) in the setting described in Figure 1 using $\ell_1$-regularization: (a) $\text{DP}(\alpha, Ax^*)$ and $\text{DP}(\alpha, y^k)$. (b) $\text{MSPE}(\alpha)$ (empirical mean over $N_\varepsilon = 10^4$) and $\text{PSURE}(\alpha, y^k)$. (c) $\text{MSEE}(\alpha)$ (empirical mean over $N_\varepsilon = 10^4$) and $\text{SURE}(\alpha, y^k)$

    Figure 13.  Empirical probabilities of (a) $\alpha$ and (b) the corresponding $\ell_1$-error for different parameter choice rules using $\ell_1$-regularization, $m = n = 64$, $l = 0.06$, $\sigma = 0.1$ and $N = 10^4$ samples of $\varepsilon$

    Figure 14.  Illustration that Theorems 3.3 and 3.8 might also hold for $\ell_1$-regularization: The left hand side of (22)/(23) is estimated by the sample mean and plotted vs. $m$. The black dotted lines were added to compare the order of convergence

    Figure 15.  Illustration of the difficulties of evaluating the SURE risk in the case of $\ell_1$-regularization: In (a), a coarse linear grid is constructed around ${{\hat{\alpha }}_{\text{DP}}}$ as $\alpha = \Delta_\alpha, 2 \Delta_\alpha, \ldots, 20\Delta_\alpha$ with $\Delta_\alpha = {{\hat{\alpha }}_{\text{DP}}}/10$. Similar to Figure 11(a) the plot suggests a clear minimum. However, using a fine, logarithmic grid, (b) reveals that it is only a sub-optimal local minimum before a very erratic part of $\text{SURE}(\alpha, y)$ starts. (c) shows how a coarse $\alpha$-grid can lead to an arbitrary projection of $\text{SURE}(\alpha, y)$ that is likely to miss important features. Both risk estimators are the same as those plotted in Figure 12(c)with the same colors. In (d), the difference between computing $\text{SURE}(\alpha, y)$ with the consistent and highly accurate version of ADMM (Impl A) and with a standard ADMM version using only 20 iterations (Impl B) is illustrated

    Table 1.  Condition of $A_l$ computed different values of $m = n$ and $l$

    $l = 0.02$ $l=0.04$ $l=0.06$ $l=0.08$ $l=0.1$
    $m=16$ 1.27e+0 1.75e+0 2.79e+0 6.77e+0 2.31e+2
    $m=32$ 1.75e+0 6.77e+0 6.94e+1 6.88e+2 2.30e+2
    $m=64$ 6.77e+0 6.88e+2 6.42e+2 1.51e+3 4.22e+3
    $m=128$ 6.88e+2 1.51e+3 1.51e+4 4.29e+3 4.29e+4
    $m=256$ 1.70e+3 4.70e+4 1.87e+6 4.07e+6 1.79e+6
    $m=512$ 4.70e+4 1.11e+7 1.22e+7 2.12e+7 3.70e+7
     | Show Table
    DownLoad: CSV

    Table 2.  Statistics of the $\ell_2$-error $\|x^* - x_{\hat \alpha}\|_2$ for different parameter choice rules using $m = n = 64$, $l = 0.06$, $\sigma = 0.1$ and $N_\varepsilon = 10^6$ samples of $\varepsilon$

    min max mean median std
    optimal 4.78 9.63 8.04 8.05 0.43
    DP 6.57 10.81 8.82 8.87 0.34
    PSURE 6.10 277.24 8.38 8.23 1.53
    SURE 6.08 339.80 27.71 8.95 37.26
     | Show Table
    DownLoad: CSV
  • [1] R. J. Adler and J. E. Taylor, Random Fields and Geometry, Springer Monographs in Mathematics, Springer, New York, 2007.
    [2] M. S. C. Almeida and M. A. T. Figueiredo, Parameter estimation for blind and non-blind deblurring using residual whiteness measures, IEEE Transactions on Image Processing, 22 (2013), 2751-2763.  doi: 10.1109/TIP.2013.2257810.
    [3] F. Bauer and T. Hohage, A Lepskij-type stopping rule for regularized Newton methods, Inverse Problems, 21 (2005), 1975-1991.  doi: 10.1088/0266-5611/21/6/011.
    [4] G. Blanchard and P. Mathé, Discrepancy principle for statistical inverse problems with application to conjugate gradient iteration Inverse Problems, 28 (2012), 115011, 23pp. doi: 10.1088/0266-5611/28/11/115011.
    [5] P. Blomgren and T. F. Chan, Modular solvers for image restoration problems using the discrepancy principle, Numerical Linear Algebra with Applications, 9 (2002), 347-358.  doi: 10.1002/nla.278.
    [6] S. BoydN. ParikhE. ChuB. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 3 (2011), 1-122.  doi: 10.1561/2200000016.
    [7] B. Bringmann, D. Cremers, F. Krahmer and M. Möller, The homotopy method revisited: Computing solution paths of $\ell_1$-regularized problems, Math. Comp., 87 (2018), 2343-2364, arXiv: 1605.00071. doi: 10.1090/mcom/3287.
    [8] M. Burger, A. Sawatzky and G. Steidl, First order algorithms in variational image processing, Splitting Methods in Communication, Imaging, Science, and Engineering, 345-407, Sci. Comput., Springer, Cham, 2016.
    [9] E. J. CandesC. A. Sing-Long and J. D. Trzasko, Unbiased risk estimates for singular value thresholding and spectral estimators, IEEE Transactions on Signal Processing, 61 (2013), 4643-4657.  doi: 10.1109/TSP.2013.2270464.
    [10] E. Chernousova and Y. Golubev, Spectral cut-off regularizations for ill-posed linear models, Math. Methods Statist., 23 (2014), 116-131.  doi: 10.3103/S1066530714020033.
    [11] C. DeledalleS. VaiterJ. Fadili and G. Peyré, Stein Unbiased GrAdient estimator of the Risk (SUGAR) for Multiple Parameter Selection, SIAM Journal on Imaging Sciences, 7 (2014), 2448-2487.  doi: 10.1137/140968045.
    [12] C. Deledalle, S. Vaiter, G. Peyré, J. Fadili and C. Dossal, Proximal splitting derivatives for risk estimation, Journal of Physics: Conference Series, 386 (2012), 012003. doi: 10.1088/1742-6596/386/1/012003.
    [13] C. Deledalle, S. Vaiter, G. Peyré, J. Fadili and C. Dossal, Unbiased risk estimation for sparse analysis regularization, in 2012 19th IEEE International Conference on Image Processing, IEEE, 2012, 3053-3056. doi: 10.1109/ICIP.2012.6467544.
    [14] C. DossalM. KachourJ. FadiliG. Peyré and C. Chesneau, The degrees of freedom of the lasso for general design matrix, Statistica Sinica, 23 (2013), 809-828. 
    [15] Y. C. Eldar, Generalized SURE for Exponential Families: Applications to Regularization, IEEE Transactions on Signal Processing, 57 (2009), 471-481.  doi: 10.1109/TSP.2008.2008212.
    [16] H. W. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, vol. 375, Springer Science & Business Media, 1996.
    [17] N. P. Galatsanos and A. K. Katsaggelos, Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation, Trans. Img. Proc., 1 (1992), 322-336.  doi: 10.1109/83.148606.
    [18] S. K. Ghoreishi and M. R. Meshkani, On SURE estimates in hierarchical models assuming heteroscedasticity for both levels of a two-level normal hierarchical model, Journal of Multivariate Analysis, 132 (2014), 129-137.  doi: 10.1016/j.jmva.2014.08.001.
    [19] R. GiryesM. Elad and Y. Eldar, The projected GSURE for automatic parameter tuning in iterative shrinkage methods, Applied and Computational Harmonic Analysis, 30 (2011), 407-422.  doi: 10.1016/j.acha.2010.11.005.
    [20] J. Hadamard, Lectures on Cauchy's Problem in Linear Partial Differential Equations, New Haven, 1953.
    [21] H. Haghshenas Lari and A. Gholami, Curvelet-TV regularized Bregman iteration for seismic random noise attenuation, Journal of Applied Geophysics, 109 (2014), 233-241.  doi: 10.1016/j.jappgeo.2014.08.005.
    [22] P. C. Hansen, Analysis of discrete ill-posed problems by means of the L-curve, SIAM Review, 34 (1992), 561-580.  doi: 10.1137/1034115.
    [23] P. C. Hansen and D. P. OLeary, The Use of the L-Curve in the Regularization of Discrete Ill-Posed Problems, SIAM Journal on Scientific Computing, 14 (1993), 1487-1503.  doi: 10.1137/0914086.
    [24] B. Jin, J. Zou et al., Iterative parameter choice by discrepancy principle, IMA Journal of Numerical Analysis, 32 (2012), 1714-1732. doi: 10.1093/imanum/drr051.
    [25] A. Kneip, Ordered linear smoothers, Ann. Statist., 22 (1994), 835-866.  doi: 10.1214/aos/1176325498.
    [26] O. V. Lepskii, On a Problem of Adaptive Estimation in Gaussian White Noise, Theory of Probability & Its Applications, 35 (1991), 454-466.  doi: 10.1137/1135065.
    [27] H. Li and F. Werner, Empirical risk minimization as parameter choice rule for general linear regularization methods, 2017, arXiv: 1703.07809.
    [28] K.-C. Li, From stein's unbiased risk estimates to the method of generalized cross validation, The Annals of Statistics, 13 (1985), 1352-1377.  doi: 10.1214/aos/1176349742.
    [29] K.-C. Li, Asymptotic optimality for $C_p$, $C_L$, cross-validation and generalized cross-validation: Discrete index set, Ann. Statist., 15 (1987), 958-975.  doi: 10.1214/aos/1176350486.
    [30] F. LuisierT. Blu and M. Unser, Image denoising in mixed Poisson-Gaussian noise, IEEE Transactions on Image Processing, 20 (2011), 696-708.  doi: 10.1109/TIP.2010.2073477.
    [31] J.-C. PesquetA. Benazza-Benyahia and C. Chaux, A SURE Approach for Digital Signal/Image Deconvolution Problems, IEEE Transactions on Signal Processing, 57 (2009), 4616-4632.  doi: 10.1109/TSP.2009.2026077.
    [32] P. QuC. Wang and G. X. Shen, Discrepancy-based adaptive regularization for grappa reconstruction, Journal of Magnetic Resonance Imaging, 24 (2006), 248-255.  doi: 10.1002/jmri.20620.
    [33] S. RamaniT. Blu and M. Unser, Monte-Carlo sure: A black-box optimization of regularization parameters for general denoising algorithms, IEEE Transactions on Image Processing, 17 (2008), 1540-1554.  doi: 10.1109/TIP.2008.2001404.
    [34] S. RamaniZ. LiuJ. RosenJ.-F. Nielsen and J. A. Fessler, Regularization parameter selection for nonlinear iterative image restoration and MRI reconstruction using GCV and SURE-based methods, IEEE Transactions on Image Processing, 21 (2012), 3659-3672.  doi: 10.1109/TIP.2012.2195015.
    [35] J. A. Rice, Choice of smoothing parameter in deconvolution problems, Contemporary Mathematics, 59 (1986), 137-151.  doi: 10.1090/conm/059/10.
    [36] C. M. Stein, Estimation of the mean of a multivariate normal distribution, The Annals of Statistics, 9 (1981), 1135-1151.  doi: 10.1214/aos/1176345632.
    [37] A. M. ThompsonJ. C. BrownJ. W. Kay and D. M. Titterington, A study of methods of choosing the smoothing parameter in image restoration by regularization, IEEE Trans. Pattern Anal. Mach. Intell., 13 (1991), 326-339.  doi: 10.1109/34.88568.
    [38] G. M. Vainikko, The discrepancy principle for a class of regularization methods, USSR Computational Mathematics and Mathematical Physics, 22 (1982), 1-19.  doi: 10.1016/0041-5553(82)90120-3.
    [39] S. VaiterC. Deledalle and G. Peyré, The degrees of freedom of partly smooth regularizers, Annals of the Institute of Statistical Mathematics, 69 (2017), 791-832.  doi: 10.1007/s10463-016-0563-z.
    [40] S. VaiterC. DeledalleG. PeyréC. Dossal and J. Fadili, Local behavior of sparse analysis regularization: Applications to risk estimation, Applied and Computational Harmonic Analysis, 35 (2013), 433-451.  doi: 10.1016/j.acha.2012.11.006.
    [41] S. A. vande Geer, Applications of Empirical Process Theory, vol. 6 of Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, 2000.
    [42] D. Van De Ville and M. Kocher, SURE-Based Non-Local Means, IEEE Signal Processing Letters, 16 (2009), 973-976, URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5165022.
    [43] D. Van DeVille and M. Kocher, Nonlocal means with dimensionality reduction and SURE-based parameter selection, IEEE Transactions on Image Processing, 20 (2011), 2683-2690.  doi: 10.1109/TIP.2011.2121083.
    [44] Y.-Q. Wang and J.-M. Morel, SURE Guided Gaussian Mixture Image Denoising, SIAM Journal on Imaging Sciences, 6 (2013), 999-1034.  doi: 10.1137/120901131.
    [45] D. S. WellerS. RamaniJ.-F. Nielsen and J. A. Fessler, Monte Carlo SURE-based parameter selection for parallel magnetic resonance imaging reconstruction, Magnetic Resonance in Medicine, 71 (2014), 1760-1770. 
    [46] X. XieS. C. Kou and L. D. Brown, SURE Estimates for a Heteroscedastic Hierarchical Model, Journal of the American Statistical Association, 107 (2012), 1465-1479.  doi: 10.1080/01621459.2012.728154.
  • 加载中

Figures(15)

Tables(2)

SHARE

Article Metrics

HTML views(875) PDF downloads(270) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return