September  2019, 1(3): 249-269. doi: 10.3934/fods.2019011

General risk measures for robust machine learning

a. 

CentraleSupélec, Inria Saclay, Université Paris-Saclay, Center for Visual Computing, Gif-sur-Yvette, 91190, France

b. 

Université Paris-Est, CERMICS (ENPC), Labex Bézout, 6-8 avenue Blaise Pascal, Champs-sur-Marne, 77420, France

* Corresponding author: Henri Gérard

Published  August 2019

Fund Project: The work of second author was supported by ENPC and Labex Bézout. The work of third author was supported by Institut Universitaire de France.

A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on $ \varphi $-divergences and the Wasserstein metric. We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.

Citation: Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011
References:
[1]

S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B (Methodological), 28 (1966), 131-142.  doi: 10.1111/j.2517-6161.1966.tb00626.x.

[2]

P. ArtznerF. DelbaenJ.-M. Eber and D. Heath, Coherent measures of risk, Mathematical Finance, 9 (1999), 203-228.  doi: 10.1111/1467-9965.00068.

[3]

M. Basseville, Divergence measures for statistical data processing–an annotated bibliography, Signal Processing, 93 (2013), 621-633.  doi: 10.1016/j.sigpro.2012.09.003.

[4]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2011. doi: 10.1007/978-3-319-48311-5.

[5]

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), 183-202.  doi: 10.1137/080716542.

[6]

A. Ben-Tal and A. Nemirovski, Robust solutions of linear programming problems contaminated with uncertain data, Mathematical Programming, 88 (2000), 411-424.  doi: 10.1007/PL00011380.

[7] A. Ben-TalL. El Ghaoui and A. Nemirovski, Robust Optimization, Princeton University Press, 2009. 
[8]

A. Ben-TalD. Den HertogA. De WaegenaereB. Melenberg and G. Rennen, Robust solutions of optimization problems affected by uncertain probabilities, Management Science, 59 (2013), 341-357.  doi: 10.1287/mnsc.1120.1641.

[9]

A. P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms, Pattern Recognition, 30 (1997), 1145-1159.  doi: 10.1016/S0031-3203(96)00142-2.

[10]

L. M. Briceno-AriasG. ChierchiaE. Chouzenoux and J.-C. Pesquet, A random block-coordinate douglas-rachford splitting method with low computational complexity for binary logistic regression, Computational Optimization and Applications, 72 (2019), 707-726.  doi: 10.1007/s10589-019-00060-6.

[11]

A. Chambolle and C. Dossal, On the convergence of the iterates of "FISTA", Journal of Optimization Theory and Applications, 166 (2015), 968-982.  doi: 10.1007/s10957-015-0746-4.

[12]

P. L. Combettes, Strong convergence of block-iterative outer approximation methods for convex optimization, SIAM Journal on Control and Optimization, 38 (2000), 538-565.  doi: 10.1137/S036301299732626X.

[13]

P. L. Combettes, A block-iterative surrogate constraint splitting method for quadratic signal recovery, IEEE Transactions on Signal Processing, 51 (2003), 1771-1782.  doi: 10.1109/TSP.2003.812846.

[14]

P. L. Combettes and C. L. Müller, Perspective functions: Proximal calculus and applications in high-dimensional statistics, Journal of Mathematical Analysis and Applications, 457 (2018), 1283-1306.  doi: 10.1016/j.jmaa.2016.12.021.

[15]

P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 185–212, Springer Optim. Appl., 49, Springer, New York, 2011. doi: 10.1007/978-1-4419-9569-8_10.

[16]

P. L. CombettesD. Dung and B. C. Vũ, Dualization of signal recovery problems, Set-Valued and Variational Analysis, 18 (2010), 373-404.  doi: 10.1007/s11228-010-0147-7.

[17]

I. Csiszár, Eine informationstheoretische ungleichung und ihre anwendung auf beweis der ergodizitaet von markoffschen ketten, Magyer Tud. Akad. Mat. Kutato Int. Koezl., 8 (1963), 85-108. 

[18]

J. Duchi, P. Glynn and H. Namkoong, Statistics of robust optimization: A generalized empirical likelihood approach, preprint, arXiv: 1610.03425, 2016.

[19]

P. M. Esfahani and D. Kuhn, Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations, Mathematical Programming, 171 (2018), 115-166.  doi: 10.1007/s10107-017-1172-1.

[20]

P. M. EsfahaniS. Shafieezadeh-AbadehG. A. Hanasusanto and D. Kuhn, Data-driven inverse optimization with imperfect information, Mathematical Programming, 167 (2018), 191-234.  doi: 10.1007/s10107-017-1216-6.

[21]

J. Feng, H. Xu, S. Mannor and S. Yan, Robust logistic regression and classification, In Advances in Neural Information Processing Systems, 2014,253–261.

[22]

H. Föllmer and A. Schied, Stochastic Finance: An Introduction in Discrete Time (4th edition), Walter de Gruyter, 2016.

[23]

J.-y. GotohM. J. Kim and A. E. Lim, Robust empirical optimization is almost the same as mean–variance optimization, Operations Research Letters, 46 (2018), 448-452.  doi: 10.1016/j.orl.2018.05.005.

[24]

Y. Haugazeau, Sur les inéquations variationnelles et la minimisation de fonctionnelles convexes, These, Universite de Paris, 1968.

[25]

Z. Hu and L. J. Hong, Kullback-leibler divergence constrained distributionally robust optimization, Available at Optimization Online, 2013.

[26]

A. Kurakin, I. Goodfellow and S. Bengio, Adversarial examples in the physical world, preprint, arXiv: 1607.02533, 2016.

[27]

S. Moghaddam and M. Mahlooji, Robust simulation optimization using $\varphi$-divergence, International Journal of Industrial Engineering Computations, 7 (2016), 517-534.  doi: 10.5267/j.ijiec.2016.5.003.

[28]

T. Morimoto, Markov processes and the h-theorem, Journal of the Physical Society of Japan, 18 (1963), 328-331.  doi: 10.1143/JPSJ.18.328.

[29]

H. Namkoong and J. C. Duchi, Stochastic gradient methods for distributionally robust optimization with f-divergences, In Advances in Neural Information Processing Systems, 2016, 2208–2216.

[30]

N. Papernot, P. McDaniel and I. Goodfellow, Transferability in machine learning: From phenomena to black-box attacks using adversarial samples, preprint, arXiv: 1605.07277, 2016.

[31]

Y. Plan and R. Vershynin, Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach, IEEE Transactions on Information Theory, 59 (2013), 482-494.  doi: 10.1109/TIT.2012.2207945.

[32]

R. T. Rockafellar and S. Uryasev, Optimization of conditional value-at-risk, Journal of Risk, 2 (2000), 21-42.  doi: 10.21314/JOR.2000.038.

[33]

A. Ruszczyński and A. Shapiro, Conditional risk mappings, Mathematics of Operations Research, 31 (2006), 544-561.  doi: 10.1287/moor.1060.0204.

[34]

A. Ruszczynski and A. Shapiro, Optimization of convex risk functions, Mathematics of Operations Research, 31 (2006), 433-452.  doi: 10.1287/moor.1050.0186.

[35]

S. Shafieezadeh-Abadeh, P. M. Esfahani and D. Kuhn, Distributionally robust logistic regression, In Advances in Neural Information Processing Systems, (2015), 1576–1584.

show all references

References:
[1]

S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B (Methodological), 28 (1966), 131-142.  doi: 10.1111/j.2517-6161.1966.tb00626.x.

[2]

P. ArtznerF. DelbaenJ.-M. Eber and D. Heath, Coherent measures of risk, Mathematical Finance, 9 (1999), 203-228.  doi: 10.1111/1467-9965.00068.

[3]

M. Basseville, Divergence measures for statistical data processing–an annotated bibliography, Signal Processing, 93 (2013), 621-633.  doi: 10.1016/j.sigpro.2012.09.003.

[4]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2011. doi: 10.1007/978-3-319-48311-5.

[5]

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), 183-202.  doi: 10.1137/080716542.

[6]

A. Ben-Tal and A. Nemirovski, Robust solutions of linear programming problems contaminated with uncertain data, Mathematical Programming, 88 (2000), 411-424.  doi: 10.1007/PL00011380.

[7] A. Ben-TalL. El Ghaoui and A. Nemirovski, Robust Optimization, Princeton University Press, 2009. 
[8]

A. Ben-TalD. Den HertogA. De WaegenaereB. Melenberg and G. Rennen, Robust solutions of optimization problems affected by uncertain probabilities, Management Science, 59 (2013), 341-357.  doi: 10.1287/mnsc.1120.1641.

[9]

A. P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms, Pattern Recognition, 30 (1997), 1145-1159.  doi: 10.1016/S0031-3203(96)00142-2.

[10]

L. M. Briceno-AriasG. ChierchiaE. Chouzenoux and J.-C. Pesquet, A random block-coordinate douglas-rachford splitting method with low computational complexity for binary logistic regression, Computational Optimization and Applications, 72 (2019), 707-726.  doi: 10.1007/s10589-019-00060-6.

[11]

A. Chambolle and C. Dossal, On the convergence of the iterates of "FISTA", Journal of Optimization Theory and Applications, 166 (2015), 968-982.  doi: 10.1007/s10957-015-0746-4.

[12]

P. L. Combettes, Strong convergence of block-iterative outer approximation methods for convex optimization, SIAM Journal on Control and Optimization, 38 (2000), 538-565.  doi: 10.1137/S036301299732626X.

[13]

P. L. Combettes, A block-iterative surrogate constraint splitting method for quadratic signal recovery, IEEE Transactions on Signal Processing, 51 (2003), 1771-1782.  doi: 10.1109/TSP.2003.812846.

[14]

P. L. Combettes and C. L. Müller, Perspective functions: Proximal calculus and applications in high-dimensional statistics, Journal of Mathematical Analysis and Applications, 457 (2018), 1283-1306.  doi: 10.1016/j.jmaa.2016.12.021.

[15]

P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 185–212, Springer Optim. Appl., 49, Springer, New York, 2011. doi: 10.1007/978-1-4419-9569-8_10.

[16]

P. L. CombettesD. Dung and B. C. Vũ, Dualization of signal recovery problems, Set-Valued and Variational Analysis, 18 (2010), 373-404.  doi: 10.1007/s11228-010-0147-7.

[17]

I. Csiszár, Eine informationstheoretische ungleichung und ihre anwendung auf beweis der ergodizitaet von markoffschen ketten, Magyer Tud. Akad. Mat. Kutato Int. Koezl., 8 (1963), 85-108. 

[18]

J. Duchi, P. Glynn and H. Namkoong, Statistics of robust optimization: A generalized empirical likelihood approach, preprint, arXiv: 1610.03425, 2016.

[19]

P. M. Esfahani and D. Kuhn, Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations, Mathematical Programming, 171 (2018), 115-166.  doi: 10.1007/s10107-017-1172-1.

[20]

P. M. EsfahaniS. Shafieezadeh-AbadehG. A. Hanasusanto and D. Kuhn, Data-driven inverse optimization with imperfect information, Mathematical Programming, 167 (2018), 191-234.  doi: 10.1007/s10107-017-1216-6.

[21]

J. Feng, H. Xu, S. Mannor and S. Yan, Robust logistic regression and classification, In Advances in Neural Information Processing Systems, 2014,253–261.

[22]

H. Föllmer and A. Schied, Stochastic Finance: An Introduction in Discrete Time (4th edition), Walter de Gruyter, 2016.

[23]

J.-y. GotohM. J. Kim and A. E. Lim, Robust empirical optimization is almost the same as mean–variance optimization, Operations Research Letters, 46 (2018), 448-452.  doi: 10.1016/j.orl.2018.05.005.

[24]

Y. Haugazeau, Sur les inéquations variationnelles et la minimisation de fonctionnelles convexes, These, Universite de Paris, 1968.

[25]

Z. Hu and L. J. Hong, Kullback-leibler divergence constrained distributionally robust optimization, Available at Optimization Online, 2013.

[26]

A. Kurakin, I. Goodfellow and S. Bengio, Adversarial examples in the physical world, preprint, arXiv: 1607.02533, 2016.

[27]

S. Moghaddam and M. Mahlooji, Robust simulation optimization using $\varphi$-divergence, International Journal of Industrial Engineering Computations, 7 (2016), 517-534.  doi: 10.5267/j.ijiec.2016.5.003.

[28]

T. Morimoto, Markov processes and the h-theorem, Journal of the Physical Society of Japan, 18 (1963), 328-331.  doi: 10.1143/JPSJ.18.328.

[29]

H. Namkoong and J. C. Duchi, Stochastic gradient methods for distributionally robust optimization with f-divergences, In Advances in Neural Information Processing Systems, 2016, 2208–2216.

[30]

N. Papernot, P. McDaniel and I. Goodfellow, Transferability in machine learning: From phenomena to black-box attacks using adversarial samples, preprint, arXiv: 1605.07277, 2016.

[31]

Y. Plan and R. Vershynin, Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach, IEEE Transactions on Information Theory, 59 (2013), 482-494.  doi: 10.1109/TIT.2012.2207945.

[32]

R. T. Rockafellar and S. Uryasev, Optimization of conditional value-at-risk, Journal of Risk, 2 (2000), 21-42.  doi: 10.21314/JOR.2000.038.

[33]

A. Ruszczyński and A. Shapiro, Conditional risk mappings, Mathematics of Operations Research, 31 (2006), 544-561.  doi: 10.1287/moor.1060.0204.

[34]

A. Ruszczynski and A. Shapiro, Optimization of convex risk functions, Mathematics of Operations Research, 31 (2006), 433-452.  doi: 10.1287/moor.1050.0186.

[35]

S. Shafieezadeh-Abadeh, P. M. Esfahani and D. Kuhn, Distributionally robust logistic regression, In Advances in Neural Information Processing Systems, (2015), 1576–1584.

Figure 1.  $\mathtt{ionosphere} $ dataset: Log of the difference between current loss and final loss, with respect to the iteration number for various values of $ \epsilon $
Figure 2.  $\mathtt{ionosphere} $ dataset: Log of the difference between current loss and final loss, with respect to the CPU time for vaious values of $ \epsilon $ over the first 100 iterations
Figure 3.  $\mathtt{ionosphere} $ dataset: AUC metric as a function of $ \epsilon $
Figure 4.  $\mathtt{ionosphere} $ dataset (altered): ROC curve for different values of $ \epsilon $
Figure 5.  $\mathtt{ionosphere} $ dataset: AUC histogram for 1000 random realizations using 10% of data for the training set. Robust model is used with $ \epsilon = 0.001 $
Figure 6.  $\mathtt{ionosphere} $ dataset: AUC histogram for 1000 random realizations using 60% of data for the training set. Robust model is used with $ \epsilon = 0.001 $
Table 1.  Common perspective functions and their conjugate used to define $\varphi$ -divergences
Divergence $\varphi\left( t \right)$ $\varphi\left( t \right), t \geq 0$ ${D_\varphi }\left( {p,q} \right)$ $\varphi^{*}\ \left( s \right)$ $\tilde \varphi \left( t \right)$
Kullback-Leibler $\varphi_{kl}\left( t \right)$ $t\log\left( t \right) -t +1$ $\sum_{i = 1}^{N}p_{i}\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ $e^{s}-1$ $\varphi_{b}\left( t \right)$
Burg entropy $\varphi_{b}\left( t \right)$ $-\log\left( t \right)+t-1$ $\sum_{i = 1}^{N}q_{i}\log\left( {\frac{{{q_i}}}{{{p_i}}}} \right)$ $-\log[\left( {1 - s} \right), s < 1$ $\varphi_{kl}\left( t \right)$
J-divergence $\varphi_{j}\left( t \right)$ $\left( {t - {\rm{1}}} \right)\log\left( t \right)$ $\sum_{i=1}^{N}\left( {{p_i} - {q_i}} \right)\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ no closed form $\varphi_{j}\left( t \right)$
$\chi^{2}$-distance $\varphi_{c}\left( t \right)$ $\frac{1}{t}\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{p_{i}-q_{i}}{p_{i}}$ $2-2\sqrt{1-s}, s <1$ $\varphi_{mc}\left( t \right)$
Modified $\chi^{2}$-distance $\varphi_{mc}\left( t \right)$ $\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{q_{i}-p_{i}}{q_{i}}$ $ \left \{ \begin{array}{ll} -1, &s <-2 \\ s+s^{2}/4, &s\geq-2 \end{array} \right . $ $\varphi_{c}\left( t \right)$
Hellinger distance $\varphi_{h}\left( t \right)$ $\left( {\sqrt t - 1} \right)^{2}$ $\sum_{i=1}^{N}\left( {\sqrt {{p_i}} - \sqrt {{q_i}} } \right)$ $\frac{s}{1-s},s <1$ $\varphi_{h}\left( t \right)$
$\chi$-divergence of order $\theta$>1 $\varphi_{ca}^{\theta}\left( t \right)$ $|{t-1}|^{\theta}$ $\sum_{i=1}^{N}q_{i}{\rm{|}}1 - \frac{{{p_i}}}{{{q_i}}}|^{\theta}$ $s+\left( {\theta - 1} \right){\left( {\frac{{|s|}}{\theta }} \right)^{\frac{\theta }{{\theta - 1}}}}$ $t^{1-\theta}\varphi_{ca}^{\theta}\left( t \right)$
Variation distance $\varphi_{v}\left( t \right)$ $|{t-1}|$ $\sum_{i=1}^{N}|{p_i} - {q_i}|$ $ \left \{ \begin{array}{ll} -1, &s\leq-1 \\ s, &-1 \leq s \leq 1 \end{array} \right . $ $\varphi_{v}\left( t \right)$
Cressie and Read $\varphi_{cr}^{\theta}\left( t \right)$ $\frac{1-\theta+\theta t-t^{\theta}}{\theta\left( {1 - \theta } \right)}, \theta \notin {\rm{\{ 0,1\} }}$ $\frac{1}{\theta\left( {1 - \theta } \right)}\left( {1 - \sum _{i = 1}^N {p_i^\theta } q_i^{1 - \theta }} \right)$ $ \left \{ \begin{array}{l} \frac{1}{\theta}\left( {1 - s\left( {1 - \theta } \right)} \right)^{\frac{\theta}{\theta-1}}-\frac{1}{\theta} \\ s < \frac{1}{\theta-1} \end{array} \right . $ $\varphi_{cr}^{1-\theta}\left( t \right)$
Average Value at Risk of level $\beta$ $\varphi_{\textrm{avar}}^{\beta}\left( t \right)$ $\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}, \beta \in [0,1]$ $\sum_{i=1}^{N}\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}(\frac{p_{i}}{q_{i}})$ $\sigma_{\left[ {0,\frac{1}{{1 - \beta }}} \right]} = \left \{ \begin{array}{l} \frac{1}{1-\beta} , s\geq 0 \\ 0 , s < 0 \end{array} \right . $ $\iota_{[1-\beta,+\infty[}$
Divergence $\varphi\left( t \right)$ $\varphi\left( t \right), t \geq 0$ ${D_\varphi }\left( {p,q} \right)$ $\varphi^{*}\ \left( s \right)$ $\tilde \varphi \left( t \right)$
Kullback-Leibler $\varphi_{kl}\left( t \right)$ $t\log\left( t \right) -t +1$ $\sum_{i = 1}^{N}p_{i}\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ $e^{s}-1$ $\varphi_{b}\left( t \right)$
Burg entropy $\varphi_{b}\left( t \right)$ $-\log\left( t \right)+t-1$ $\sum_{i = 1}^{N}q_{i}\log\left( {\frac{{{q_i}}}{{{p_i}}}} \right)$ $-\log[\left( {1 - s} \right), s < 1$ $\varphi_{kl}\left( t \right)$
J-divergence $\varphi_{j}\left( t \right)$ $\left( {t - {\rm{1}}} \right)\log\left( t \right)$ $\sum_{i=1}^{N}\left( {{p_i} - {q_i}} \right)\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ no closed form $\varphi_{j}\left( t \right)$
$\chi^{2}$-distance $\varphi_{c}\left( t \right)$ $\frac{1}{t}\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{p_{i}-q_{i}}{p_{i}}$ $2-2\sqrt{1-s}, s <1$ $\varphi_{mc}\left( t \right)$
Modified $\chi^{2}$-distance $\varphi_{mc}\left( t \right)$ $\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{q_{i}-p_{i}}{q_{i}}$ $ \left \{ \begin{array}{ll} -1, &s <-2 \\ s+s^{2}/4, &s\geq-2 \end{array} \right . $ $\varphi_{c}\left( t \right)$
Hellinger distance $\varphi_{h}\left( t \right)$ $\left( {\sqrt t - 1} \right)^{2}$ $\sum_{i=1}^{N}\left( {\sqrt {{p_i}} - \sqrt {{q_i}} } \right)$ $\frac{s}{1-s},s <1$ $\varphi_{h}\left( t \right)$
$\chi$-divergence of order $\theta$>1 $\varphi_{ca}^{\theta}\left( t \right)$ $|{t-1}|^{\theta}$ $\sum_{i=1}^{N}q_{i}{\rm{|}}1 - \frac{{{p_i}}}{{{q_i}}}|^{\theta}$ $s+\left( {\theta - 1} \right){\left( {\frac{{|s|}}{\theta }} \right)^{\frac{\theta }{{\theta - 1}}}}$ $t^{1-\theta}\varphi_{ca}^{\theta}\left( t \right)$
Variation distance $\varphi_{v}\left( t \right)$ $|{t-1}|$ $\sum_{i=1}^{N}|{p_i} - {q_i}|$ $ \left \{ \begin{array}{ll} -1, &s\leq-1 \\ s, &-1 \leq s \leq 1 \end{array} \right . $ $\varphi_{v}\left( t \right)$
Cressie and Read $\varphi_{cr}^{\theta}\left( t \right)$ $\frac{1-\theta+\theta t-t^{\theta}}{\theta\left( {1 - \theta } \right)}, \theta \notin {\rm{\{ 0,1\} }}$ $\frac{1}{\theta\left( {1 - \theta } \right)}\left( {1 - \sum _{i = 1}^N {p_i^\theta } q_i^{1 - \theta }} \right)$ $ \left \{ \begin{array}{l} \frac{1}{\theta}\left( {1 - s\left( {1 - \theta } \right)} \right)^{\frac{\theta}{\theta-1}}-\frac{1}{\theta} \\ s < \frac{1}{\theta-1} \end{array} \right . $ $\varphi_{cr}^{1-\theta}\left( t \right)$
Average Value at Risk of level $\beta$ $\varphi_{\textrm{avar}}^{\beta}\left( t \right)$ $\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}, \beta \in [0,1]$ $\sum_{i=1}^{N}\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}(\frac{p_{i}}{q_{i}})$ $\sigma_{\left[ {0,\frac{1}{{1 - \beta }}} \right]} = \left \{ \begin{array}{l} \frac{1}{1-\beta} , s\geq 0 \\ 0 , s < 0 \end{array} \right . $ $\iota_{[1-\beta,+\infty[}$
Table 2.  Parameters of the datasets
Name of dataset $\mathtt{ionosphere} $ $\mathtt{colon-cancer}$
Number of observations ($ N $) 351 64
Number of features ($ d $) 34 2000
Name of dataset $\mathtt{ionosphere} $ $\mathtt{colon-cancer}$
Number of observations ($ N $) 351 64
Number of features ($ d $) 34 2000
Table 3.  $\mathtt{colon-cancer}$ dataset: Values of the AUC for different values of $ \epsilon $
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.832 0.832
$ \epsilon = 0.001 $ 0.757 0.787
$ \epsilon = 0.002 $ 0.750 0.770
$ \epsilon = 0.003 $ 0.779 0.706
$ \epsilon = 0.004 $ 0.698 0.691
$ \epsilon = 0.005 $ 0.868 0.831
$ \epsilon = 0.006 $ 0.890 0.860
$ \epsilon = 0.007 $ 0.728 0.838
$ \epsilon = 0.008 $ 0.809 0.768
$ \epsilon = 0.009 $ 0.875 0.890
$ \epsilon = 0.01 $ 0.801 0.853
$ \epsilon = 0.05 $ 0.786 0.794
$ \epsilon = 0.1 $ 0.801 0.816
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.832 0.832
$ \epsilon = 0.001 $ 0.757 0.787
$ \epsilon = 0.002 $ 0.750 0.770
$ \epsilon = 0.003 $ 0.779 0.706
$ \epsilon = 0.004 $ 0.698 0.691
$ \epsilon = 0.005 $ 0.868 0.831
$ \epsilon = 0.006 $ 0.890 0.860
$ \epsilon = 0.007 $ 0.728 0.838
$ \epsilon = 0.008 $ 0.809 0.768
$ \epsilon = 0.009 $ 0.875 0.890
$ \epsilon = 0.01 $ 0.801 0.853
$ \epsilon = 0.05 $ 0.786 0.794
$ \epsilon = 0.1 $ 0.801 0.816
Table 4.  $\mathtt{ionosphere} $ dataset (altered): Values of the area under ROC curve for different values of $ \epsilon $
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.514 0.514
$ \epsilon = 0.001 $ 0.816 0.840
$ \epsilon = 0.002 $ 0.804 0.835
$ \epsilon = 0.003 $ 0.840 0.814
$ \epsilon = 0.004 $ 0.824 0.830
$ \epsilon = 0.005 $ 0.815 0.829
$ \epsilon = 0.006 $ 0.834 0.829
$ \epsilon = 0.007 $ 0.821 0.815
$ \epsilon = 0.008 $ 0.835 0.815
$ \epsilon = 0.009 $ 0.823 0.822
$ \epsilon = 0.01 $ 0.828 0.835
$ \epsilon = 0.05 $ 0.815 0.826
$ \epsilon = 0.1 $ 0.824 0.823
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.514 0.514
$ \epsilon = 0.001 $ 0.816 0.840
$ \epsilon = 0.002 $ 0.804 0.835
$ \epsilon = 0.003 $ 0.840 0.814
$ \epsilon = 0.004 $ 0.824 0.830
$ \epsilon = 0.005 $ 0.815 0.829
$ \epsilon = 0.006 $ 0.834 0.829
$ \epsilon = 0.007 $ 0.821 0.815
$ \epsilon = 0.008 $ 0.835 0.815
$ \epsilon = 0.009 $ 0.823 0.822
$ \epsilon = 0.01 $ 0.828 0.835
$ \epsilon = 0.05 $ 0.815 0.826
$ \epsilon = 0.1 $ 0.824 0.823
[1]

Haodong Yu, Jie Sun. Robust stochastic optimization with convex risk measures: A discretized subgradient scheme. Journal of Industrial and Management Optimization, 2021, 17 (1) : 81-99. doi: 10.3934/jimo.2019100

[2]

Qing Ma, Yanjun Wang. Distributionally robust chance constrained svm model with $\ell_2$-Wasserstein distance. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021212

[3]

Ripeng Huang, Shaojian Qu, Xiaoguang Yang, Zhimin Liu. Multi-stage distributionally robust optimization with risk aversion. Journal of Industrial and Management Optimization, 2021, 17 (1) : 233-259. doi: 10.3934/jimo.2019109

[4]

Jutamas Kerdkaew, Rabian Wangkeeree. Characterizing robust weak sharp solution sets of convex optimization problems with uncertainty. Journal of Industrial and Management Optimization, 2020, 16 (6) : 2651-2673. doi: 10.3934/jimo.2019074

[5]

Liliana Trejo-Valencia, Edgardo Ugalde. Projective distance and $g$-measures. Discrete and Continuous Dynamical Systems - B, 2015, 20 (10) : 3565-3579. doi: 10.3934/dcdsb.2015.20.3565

[6]

Han Yang, Jia Yue, Nan-jing Huang. Multi-objective robust cross-market mixed portfolio optimization under hierarchical risk integration. Journal of Industrial and Management Optimization, 2020, 16 (2) : 759-775. doi: 10.3934/jimo.2018177

[7]

Reza Lotfi, Yahia Zare Mehrjerdi, Mir Saman Pishvaee, Ahmad Sadeghieh, Gerhard-Wilhelm Weber. A robust optimization model for sustainable and resilient closed-loop supply chain network design considering conditional value at risk. Numerical Algebra, Control and Optimization, 2021, 11 (2) : 221-253. doi: 10.3934/naco.2020023

[8]

Xing Huang, Feng-Yu Wang. Mckean-Vlasov sdes with drifts discontinuous under wasserstein distance. Discrete and Continuous Dynamical Systems, 2021, 41 (4) : 1667-1679. doi: 10.3934/dcds.2020336

[9]

Nithirat Sisarat, Rabian Wangkeeree, Gue Myung Lee. Some characterizations of robust solution sets for uncertain convex optimization problems with locally Lipschitz inequality constraints. Journal of Industrial and Management Optimization, 2020, 16 (1) : 469-493. doi: 10.3934/jimo.2018163

[10]

Ana Rita Nogueira, João Gama, Carlos Abreu Ferreira. Causal discovery in machine learning: Theories and applications. Journal of Dynamics and Games, 2021, 8 (3) : 203-231. doi: 10.3934/jdg.2021008

[11]

Sriram Nagaraj. Optimization and learning with nonlocal calculus. Foundations of Data Science, 2022  doi: 10.3934/fods.2022009

[12]

Oliver Jenkinson. Optimization and majorization of invariant measures. Electronic Research Announcements, 2007, 13: 1-12.

[13]

Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010

[14]

Murat Adivar, Shu-Cherng Fang. Convex optimization on mixed domains. Journal of Industrial and Management Optimization, 2012, 8 (1) : 189-227. doi: 10.3934/jimo.2012.8.189

[15]

Olivier P. Faugeras, Ludger Rüschendorf. Risk excess measures induced by hemi-metrics. Probability, Uncertainty and Quantitative Risk, 2018, 3 (0) : 6-. doi: 10.1186/s41546-018-0032-0

[16]

Raz Kupferman, Asaf Shachar. On strain measures and the geodesic distance to $SO_n$ in the general linear group. Journal of Geometric Mechanics, 2016, 8 (4) : 437-460. doi: 10.3934/jgm.2016015

[17]

Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031

[18]

Mingbao Cheng, Shuxian Xiao, Guosheng Liu. Single-machine rescheduling problems with learning effect under disruptions. Journal of Industrial and Management Optimization, 2018, 14 (3) : 967-980. doi: 10.3934/jimo.2017085

[19]

Andreas Chirstmann, Qiang Wu, Ding-Xuan Zhou. Preface to the special issue on analysis in machine learning and data science. Communications on Pure and Applied Analysis, 2020, 19 (8) : i-iii. doi: 10.3934/cpaa.2020171

[20]

Xi Chen, Zongrun Wang, Songhai Deng, Yong Fang. Risk measure optimization: Perceived risk and overconfidence of structured product investors. Journal of Industrial and Management Optimization, 2019, 15 (3) : 1473-1492. doi: 10.3934/jimo.2018105

 Impact Factor: 

Metrics

  • PDF downloads (334)
  • HTML views (1035)
  • Cited by (1)

[Back to Top]