
-
Previous Article
A stochastic newsvendor game with dynamic retail prices
- JIMO Home
- This Issue
-
Next Article
An adaptive trust region algorithm for large-residual nonsmooth least squares problems
Least absolute deviations learning of multiple tasks
1. | School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China |
2. | School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China |
3. | Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China |
4. | School of Mathematics and Computer Sciences, Gannan Normal University, Ganzhou 341000, China |
In this paper, we propose a new multitask feature selection model based on least absolute deviations. However, due to the inherent nonsmoothness of $l_1 $ norm, optimizing this model is challenging. To tackle this problem efficiently, we introduce an alternating iterative optimization algorithm. Moreover, under some mild conditions, its global convergence result could be established. Experimental results and comparison with the state-of-the-art algorithm SLEP show the efficiency and effectiveness of the proposed approach in solving multitask learning problems.
References:
[1] |
A. Argyriou, T. Evgeniou and M. Pontil,
Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272.
|
[2] |
Y. Bai and M. Tang,
Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913.
|
[3] |
B. Bakker and T. Heskes,
Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99.
|
[4] |
S. Ben-David and R. Schuller,
Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580.
doi: 10.1007/978-3-540-45167-9_41. |
[5] |
K. P. Bennett, J. Hu, X. Ji, G. Kunapuli and J.-S. Pang,
Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929.
|
[6] |
X. Chen, W. Pan, J. T. Kwok and J. G. Carbonell,
Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751.
doi: 10.1109/ICDM.2009.128. |
[7] |
T. Evgeniou, C. A. Micchelli and M. Pontil,
Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637.
|
[8] |
D. Gabay and B. Mercier,
A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40.
doi: 10.1016/0898-1221(76)90003-1. |
[9] |
P. Gong, J. Ye and C. Zhang,
Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996.
|
[10] |
P. Gong, J. Ye and C. Zhang,
Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010.
|
[11] |
B. He, L. Liao, D. Han and H. Yang,
A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118.
doi: 10.1007/s101070100280. |
[12] |
Y. Hu, Z. Wei and G. Yuan,
Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367.
doi: 10.19139/106. |
[13] |
M. Kowalski, M. Szafranski and L. Ralaivola,
Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552.
doi: 10.1145/1553374.1553445. |
[14] |
J. Liu, S. Ji and J. Ye,
Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348.
|
[15] |
J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP. |
[16] |
Y. Lu, Y.-E. Ge and L-W. Zhang,
An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336.
doi: 10.3934/jimo.2016.12.317. |
[17] |
F. Nie, H. Huang, X. Cai and C. Ding,
Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821.
|
[18] |
G. Obozinski, B. Taskar and M. I. Jordan,
Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252.
doi: 10.1007/s11222-008-9111-x. |
[19] |
A. Quattoni, X. Carreras, M. Collins and T. Darrell,
An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864.
|
[20] |
Z. Shen, Z. Geng and J. Yang,
Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14.
doi: 10.19139/121. |
[21] |
S. Xiang, F. Nie, G. Meng, C. Pan and C. Zhang,
Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754.
|
[22] |
Y. Xiao, S.-Y. Wu and B. He,
A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069.
doi: 10.3934/jimo.2012.8.1057. |
[23] |
T. Xiong, J. Bi, B. Rao and V. Cherkassky,
Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342.
doi: 10.1137/1.9781611972771.30. |
[24] |
W. Xue and W. Zhang,
Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450.
doi: 10.1109/TNNLS.2016.2514413. |
[25] |
W. Xue and W. Zhang,
Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203.
doi: 10.1007/978-3-319-46672-9_23. |
[26] |
H. Yang, M. R. Lyu and I. King,
Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27.
|
[27] |
L. Yang, T. K. Pong and X. Chen,
Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506.
doi: 10.1137/15M1027528. |
[28] |
G. Yu, W. Xue and Y. Zhou,
A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249.
doi: 10.1016/j.sigpro.2014.02.025. |
show all references
References:
[1] |
A. Argyriou, T. Evgeniou and M. Pontil,
Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272.
|
[2] |
Y. Bai and M. Tang,
Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913.
|
[3] |
B. Bakker and T. Heskes,
Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99.
|
[4] |
S. Ben-David and R. Schuller,
Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580.
doi: 10.1007/978-3-540-45167-9_41. |
[5] |
K. P. Bennett, J. Hu, X. Ji, G. Kunapuli and J.-S. Pang,
Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929.
|
[6] |
X. Chen, W. Pan, J. T. Kwok and J. G. Carbonell,
Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751.
doi: 10.1109/ICDM.2009.128. |
[7] |
T. Evgeniou, C. A. Micchelli and M. Pontil,
Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637.
|
[8] |
D. Gabay and B. Mercier,
A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40.
doi: 10.1016/0898-1221(76)90003-1. |
[9] |
P. Gong, J. Ye and C. Zhang,
Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996.
|
[10] |
P. Gong, J. Ye and C. Zhang,
Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010.
|
[11] |
B. He, L. Liao, D. Han and H. Yang,
A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118.
doi: 10.1007/s101070100280. |
[12] |
Y. Hu, Z. Wei and G. Yuan,
Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367.
doi: 10.19139/106. |
[13] |
M. Kowalski, M. Szafranski and L. Ralaivola,
Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552.
doi: 10.1145/1553374.1553445. |
[14] |
J. Liu, S. Ji and J. Ye,
Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348.
|
[15] |
J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP. |
[16] |
Y. Lu, Y.-E. Ge and L-W. Zhang,
An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336.
doi: 10.3934/jimo.2016.12.317. |
[17] |
F. Nie, H. Huang, X. Cai and C. Ding,
Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821.
|
[18] |
G. Obozinski, B. Taskar and M. I. Jordan,
Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252.
doi: 10.1007/s11222-008-9111-x. |
[19] |
A. Quattoni, X. Carreras, M. Collins and T. Darrell,
An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864.
|
[20] |
Z. Shen, Z. Geng and J. Yang,
Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14.
doi: 10.19139/121. |
[21] |
S. Xiang, F. Nie, G. Meng, C. Pan and C. Zhang,
Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754.
|
[22] |
Y. Xiao, S.-Y. Wu and B. He,
A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069.
doi: 10.3934/jimo.2012.8.1057. |
[23] |
T. Xiong, J. Bi, B. Rao and V. Cherkassky,
Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342.
doi: 10.1137/1.9781611972771.30. |
[24] |
W. Xue and W. Zhang,
Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450.
doi: 10.1109/TNNLS.2016.2514413. |
[25] |
W. Xue and W. Zhang,
Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203.
doi: 10.1007/978-3-319-46672-9_23. |
[26] |
H. Yang, M. R. Lyu and I. King,
Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27.
|
[27] |
L. Yang, T. K. Pong and X. Chen,
Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506.
doi: 10.1137/15M1027528. |
[28] |
G. Yu, W. Xue and Y. Zhou,
A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249.
doi: 10.1016/j.sigpro.2014.02.025. |


Input: | |
1 | while "not converged", do |
2 | Compute |
3 | Compute |
4 | Update the multipliers |
5 | end while |
Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5). |
Input: | |
1 | while "not converged", do |
2 | Compute |
3 | Compute |
4 | Update the multipliers |
5 | end while |
Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5). |
(m, n, k) | SLEP | LADL21 |
(10000, 20,100) | 0.0139 | 0.0025 |
(20000, 20,200) | 0.0136 | 0.0017 |
(30000, 20,300) | 0.0140 | 0.0015 |
(40000, 20,400) | 0.0138 | 0.0016 |
(10000, 30,100) | 0.0141 | 0.0016 |
(20000, 30,200) | 0.0199 | 0.0016 |
(30000, 30,300) | 0.0143 | 0.0017 |
(40000, 30,400) | 0.0202 | 0.0018 |
(10000, 40,100) | 0.0187 | 0.0018 |
(20000, 40,200) | 0.0146 | 0.0018 |
(30000, 40,300) | 0.0154 | 0.0020 |
(40000, 40,400) | 0.0188 | 0.0021 |
(10000, 50,100) | 0.0170 | 0.0020 |
(20000, 50,200) | 0.0179 | 0.0022 |
(30000, 50,300) | 0.0177 | 0.0022 |
(40000, 50,400) | 0.0176 | 0.0024 |
(m, n, k) | SLEP | LADL21 |
(10000, 20,100) | 0.0139 | 0.0025 |
(20000, 20,200) | 0.0136 | 0.0017 |
(30000, 20,300) | 0.0140 | 0.0015 |
(40000, 20,400) | 0.0138 | 0.0016 |
(10000, 30,100) | 0.0141 | 0.0016 |
(20000, 30,200) | 0.0199 | 0.0016 |
(30000, 30,300) | 0.0143 | 0.0017 |
(40000, 30,400) | 0.0202 | 0.0018 |
(10000, 40,100) | 0.0187 | 0.0018 |
(20000, 40,200) | 0.0146 | 0.0018 |
(30000, 40,300) | 0.0154 | 0.0020 |
(40000, 40,400) | 0.0188 | 0.0021 |
(10000, 50,100) | 0.0170 | 0.0020 |
(20000, 50,200) | 0.0179 | 0.0022 |
(30000, 50,300) | 0.0177 | 0.0022 |
(40000, 50,400) | 0.0176 | 0.0024 |
[1] |
Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial and Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057 |
[2] |
Ahmad Mousavi, Zheming Gao, Lanshan Han, Alvin Lim. Quadratic surface support vector machine with L1 norm regularization. Journal of Industrial and Management Optimization, 2022, 18 (3) : 1835-1861. doi: 10.3934/jimo.2021046 |
[3] |
Russell E. Warren, Stanley J. Osher. Hyperspectral unmixing by the alternating direction method of multipliers. Inverse Problems and Imaging, 2015, 9 (3) : 917-933. doi: 10.3934/ipi.2015.9.917 |
[4] |
Duo Wang, Zheng-Fen Jin, Youlin Shang. A penalty decomposition method for nuclear norm minimization with l1 norm fidelity term. Evolution Equations and Control Theory, 2019, 8 (4) : 695-708. doi: 10.3934/eect.2019034 |
[5] |
Sohana Jahan. Supervised distance preserving projection using alternating direction method of multipliers. Journal of Industrial and Management Optimization, 2020, 16 (4) : 1783-1799. doi: 10.3934/jimo.2019029 |
[6] |
Jie-Wen He, Chi-Chon Lei, Chen-Yang Shi, Seak-Weng Vong. An inexact alternating direction method of multipliers for a kind of nonlinear complementarity problems. Numerical Algebra, Control and Optimization, 2021, 11 (3) : 353-362. doi: 10.3934/naco.2020030 |
[7] |
Foxiang Liu, Lingling Xu, Yuehong Sun, Deren Han. A proximal alternating direction method for multi-block coupled convex optimization. Journal of Industrial and Management Optimization, 2019, 15 (2) : 723-737. doi: 10.3934/jimo.2018067 |
[8] |
Feishe Chen, Lixin Shen, Yuesheng Xu, Xueying Zeng. The Moreau envelope approach for the L1/TV image denoising model. Inverse Problems and Imaging, 2014, 8 (1) : 53-77. doi: 10.3934/ipi.2014.8.53 |
[9] |
Linghua Chen, Espen R. Jakobsen. L1 semigroup generation for Fokker-Planck operators associated to general Lévy driven SDEs. Discrete and Continuous Dynamical Systems, 2018, 38 (11) : 5735-5763. doi: 10.3934/dcds.2018250 |
[10] |
Bingsheng He, Xiaoming Yuan. Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numerical Algebra, Control and Optimization, 2013, 3 (2) : 247-260. doi: 10.3934/naco.2013.3.247 |
[11] |
Yuan Shen, Lei Ji. Partial convolution for total variation deblurring and denoising by new linearized alternating direction method of multipliers with extension step. Journal of Industrial and Management Optimization, 2019, 15 (1) : 159-175. doi: 10.3934/jimo.2018037 |
[12] |
Feng Ma, Jiansheng Shu, Yaxiong Li, Jian Wu. The dual step size of the alternating direction method can be larger than 1.618 when one function is strongly convex. Journal of Industrial and Management Optimization, 2021, 17 (3) : 1173-1185. doi: 10.3934/jimo.2020016 |
[13] |
Yue Lu, Ying-En Ge, Li-Wei Zhang. An alternating direction method for solving a class of inverse semi-definite quadratic programming problems. Journal of Industrial and Management Optimization, 2016, 12 (1) : 317-336. doi: 10.3934/jimo.2016.12.317 |
[14] |
Yan Gu, Nobuo Yamashita. Alternating direction method of multipliers with variable metric indefinite proximal terms for convex optimization. Numerical Algebra, Control and Optimization, 2020, 10 (4) : 487-510. doi: 10.3934/naco.2020047 |
[15] |
Zhongming Wu, Xingju Cai, Deren Han. Linearized block-wise alternating direction method of multipliers for multiple-block convex programming. Journal of Industrial and Management Optimization, 2018, 14 (3) : 833-855. doi: 10.3934/jimo.2017078 |
[16] |
Yingying Li, Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging, 2009, 3 (3) : 487-503. doi: 10.3934/ipi.2009.3.487 |
[17] |
Junying Hu, Xiaofei Qian, Jun Pei, Changchun Tan, Panos M. Pardalos, Xinbao Liu. A novel quality prediction method based on feature selection considering high dimensional product quality data. Journal of Industrial and Management Optimization, 2022, 18 (4) : 2977-3000. doi: 10.3934/jimo.2021099 |
[18] |
Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2018 doi: 10.3934/bdia.2017020 |
[19] |
Gernot Holler, Karl Kunisch. Learning nonlocal regularization operators. Mathematical Control and Related Fields, 2022, 12 (1) : 81-114. doi: 10.3934/mcrf.2021003 |
[20] |
David E. Bernholdt, Mark R. Cianciosa, Clement Etienam, David L. Green, Kody J. H. Law, Jin M. Park.
Corrigendum to "Cluster, classify, regress: A general method for learning discontinuous functions [ |
2021 Impact Factor: 1.411
Tools
Metrics
Other articles
by authors
[Back to Top]