April  2018, 14(2): 719-729. doi: 10.3934/jimo.2017071

Least absolute deviations learning of multiple tasks

1. 

School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China

2. 

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

3. 

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

4. 

School of Mathematics and Computer Sciences, Gannan Normal University, Ganzhou 341000, China

Received  December 2015 Published  April 2018 Early access  September 2017

Fund Project: The initial version of this work was done while the first author was a Ph.D. candidate at Nanjing University of Science and Technology.

In this paper, we propose a new multitask feature selection model based on least absolute deviations. However, due to the inherent nonsmoothness of $l_1 $ norm, optimizing this model is challenging. To tackle this problem efficiently, we introduce an alternating iterative optimization algorithm. Moreover, under some mild conditions, its global convergence result could be established. Experimental results and comparison with the state-of-the-art algorithm SLEP show the efficiency and effectiveness of the proposed approach in solving multitask learning problems.

Citation: Wei Xue, Wensheng Zhang, Gaohang Yu. Least absolute deviations learning of multiple tasks. Journal of Industrial and Management Optimization, 2018, 14 (2) : 719-729. doi: 10.3934/jimo.2017071
References:
[1]

A. ArgyriouT. Evgeniou and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272. 

[2]

Y. Bai and M. Tang, Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913. 

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99. 

[4]

S. Ben-David and R. Schuller, Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580.  doi: 10.1007/978-3-540-45167-9_41.

[5]

K. P. BennettJ. HuX. JiG. Kunapuli and J.-S. Pang, Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929. 

[6]

X. ChenW. PanJ. T. Kwok and J. G. Carbonell, Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751.  doi: 10.1109/ICDM.2009.128.

[7]

T. EvgeniouC. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637. 

[8]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40.  doi: 10.1016/0898-1221(76)90003-1.

[9]

P. GongJ. Ye and C. Zhang, Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996. 

[10]

P. GongJ. Ye and C. Zhang, Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010. 

[11]

B. HeL. LiaoD. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118.  doi: 10.1007/s101070100280.

[12]

Y. HuZ. Wei and G. Yuan, Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367.  doi: 10.19139/106.

[13]

M. KowalskiM. Szafranski and L. Ralaivola, Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552.  doi: 10.1145/1553374.1553445.

[14]

J. LiuS. Ji and J. Ye, Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348. 

[15]

J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP.

[16]

Y. LuY.-E. Ge and L-W. Zhang, An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336.  doi: 10.3934/jimo.2016.12.317.

[17]

F. NieH. HuangX. Cai and C. Ding, Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821. 

[18]

G. ObozinskiB. Taskar and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252.  doi: 10.1007/s11222-008-9111-x.

[19]

A. QuattoniX. CarrerasM. Collins and T. Darrell, An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864. 

[20]

Z. ShenZ. Geng and J. Yang, Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14.  doi: 10.19139/121.

[21]

S. XiangF. NieG. MengC. Pan and C. Zhang, Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754. 

[22]

Y. XiaoS.-Y. Wu and B. He, A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069.  doi: 10.3934/jimo.2012.8.1057.

[23]

T. XiongJ. BiB. Rao and V. Cherkassky, Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342.  doi: 10.1137/1.9781611972771.30.

[24]

W. Xue and W. Zhang, Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450.  doi: 10.1109/TNNLS.2016.2514413.

[25]

W. Xue and W. Zhang, Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203.  doi: 10.1007/978-3-319-46672-9_23.

[26]

H. YangM. R. Lyu and I. King, Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27. 

[27]

L. YangT. K. Pong and X. Chen, Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506.  doi: 10.1137/15M1027528.

[28]

G. YuW. Xue and Y. Zhou, A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249.  doi: 10.1016/j.sigpro.2014.02.025.

show all references

References:
[1]

A. ArgyriouT. Evgeniou and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272. 

[2]

Y. Bai and M. Tang, Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913. 

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99. 

[4]

S. Ben-David and R. Schuller, Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580.  doi: 10.1007/978-3-540-45167-9_41.

[5]

K. P. BennettJ. HuX. JiG. Kunapuli and J.-S. Pang, Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929. 

[6]

X. ChenW. PanJ. T. Kwok and J. G. Carbonell, Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751.  doi: 10.1109/ICDM.2009.128.

[7]

T. EvgeniouC. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637. 

[8]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40.  doi: 10.1016/0898-1221(76)90003-1.

[9]

P. GongJ. Ye and C. Zhang, Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996. 

[10]

P. GongJ. Ye and C. Zhang, Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010. 

[11]

B. HeL. LiaoD. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118.  doi: 10.1007/s101070100280.

[12]

Y. HuZ. Wei and G. Yuan, Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367.  doi: 10.19139/106.

[13]

M. KowalskiM. Szafranski and L. Ralaivola, Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552.  doi: 10.1145/1553374.1553445.

[14]

J. LiuS. Ji and J. Ye, Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348. 

[15]

J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP.

[16]

Y. LuY.-E. Ge and L-W. Zhang, An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336.  doi: 10.3934/jimo.2016.12.317.

[17]

F. NieH. HuangX. Cai and C. Ding, Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821. 

[18]

G. ObozinskiB. Taskar and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252.  doi: 10.1007/s11222-008-9111-x.

[19]

A. QuattoniX. CarrerasM. Collins and T. Darrell, An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864. 

[20]

Z. ShenZ. Geng and J. Yang, Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14.  doi: 10.19139/121.

[21]

S. XiangF. NieG. MengC. Pan and C. Zhang, Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754. 

[22]

Y. XiaoS.-Y. Wu and B. He, A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069.  doi: 10.3934/jimo.2012.8.1057.

[23]

T. XiongJ. BiB. Rao and V. Cherkassky, Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342.  doi: 10.1137/1.9781611972771.30.

[24]

W. Xue and W. Zhang, Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450.  doi: 10.1109/TNNLS.2016.2514413.

[25]

W. Xue and W. Zhang, Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203.  doi: 10.1007/978-3-319-46672-9_23.

[26]

H. YangM. R. Lyu and I. King, Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27. 

[27]

L. YangT. K. Pong and X. Chen, Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506.  doi: 10.1137/15M1027528.

[28]

G. YuW. Xue and Y. Zhou, A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249.  doi: 10.1016/j.sigpro.2014.02.025.

Figure 1.  The first row shows the convergence results of SLEP and LADL21 when $Cov=diag\{1, 0.25, 0.1, 0.05, 0.01\}$. The second row shows the convergence results of SLEP and LADL21 when $Cov=diag\{0.81, 0.64, 0.49, 0.36, 0.25, 0.16, 0.09, 0.04\}$
Figure 2.  Comparison results of SLEP and LADL21 on the School data
Input: $\mathbf{r}_0$, $\boldsymbol{\lambda}_0$, $\mu$, $\beta$, and $\tau$.
1while "not converged", do
2Compute $W_{k+1}$ according to (15) for given $\mathbf{r}_k$ and $\boldsymbol{\lambda}_k$.
3Compute $\mathbf{r}_{k+1}$ according to (17) for given $W_{k+1}$ and $\boldsymbol{\lambda}_k$.
4Update the multipliers $\boldsymbol{\lambda}_{k+1}$ according to (18).
5end while
 Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5).
Input: $\mathbf{r}_0$, $\boldsymbol{\lambda}_0$, $\mu$, $\beta$, and $\tau$.
1while "not converged", do
2Compute $W_{k+1}$ according to (15) for given $\mathbf{r}_k$ and $\boldsymbol{\lambda}_k$.
3Compute $\mathbf{r}_{k+1}$ according to (17) for given $W_{k+1}$ and $\boldsymbol{\lambda}_k$.
4Update the multipliers $\boldsymbol{\lambda}_{k+1}$ according to (18).
5end while
 Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5).
Table 1.  Comparison of the RelErr
(m, n, k)SLEPLADL21
(10000, 20,100)0.01390.0025
(20000, 20,200)0.01360.0017
(30000, 20,300)0.01400.0015
(40000, 20,400)0.01380.0016
(10000, 30,100)0.01410.0016
(20000, 30,200)0.01990.0016
(30000, 30,300)0.01430.0017
(40000, 30,400)0.02020.0018
(10000, 40,100)0.01870.0018
(20000, 40,200)0.01460.0018
(30000, 40,300)0.01540.0020
(40000, 40,400)0.01880.0021
(10000, 50,100)0.01700.0020
(20000, 50,200)0.01790.0022
(30000, 50,300)0.01770.0022
(40000, 50,400)0.01760.0024
(m, n, k)SLEPLADL21
(10000, 20,100)0.01390.0025
(20000, 20,200)0.01360.0017
(30000, 20,300)0.01400.0015
(40000, 20,400)0.01380.0016
(10000, 30,100)0.01410.0016
(20000, 30,200)0.01990.0016
(30000, 30,300)0.01430.0017
(40000, 30,400)0.02020.0018
(10000, 40,100)0.01870.0018
(20000, 40,200)0.01460.0018
(30000, 40,300)0.01540.0020
(40000, 40,400)0.01880.0021
(10000, 50,100)0.01700.0020
(20000, 50,200)0.01790.0022
(30000, 50,300)0.01770.0022
(40000, 50,400)0.01760.0024
[1]

Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial and Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057

[2]

Ahmad Mousavi, Zheming Gao, Lanshan Han, Alvin Lim. Quadratic surface support vector machine with L1 norm regularization. Journal of Industrial and Management Optimization, 2022, 18 (3) : 1835-1861. doi: 10.3934/jimo.2021046

[3]

Russell E. Warren, Stanley J. Osher. Hyperspectral unmixing by the alternating direction method of multipliers. Inverse Problems and Imaging, 2015, 9 (3) : 917-933. doi: 10.3934/ipi.2015.9.917

[4]

Duo Wang, Zheng-Fen Jin, Youlin Shang. A penalty decomposition method for nuclear norm minimization with l1 norm fidelity term. Evolution Equations and Control Theory, 2019, 8 (4) : 695-708. doi: 10.3934/eect.2019034

[5]

Sohana Jahan. Supervised distance preserving projection using alternating direction method of multipliers. Journal of Industrial and Management Optimization, 2020, 16 (4) : 1783-1799. doi: 10.3934/jimo.2019029

[6]

Jie-Wen He, Chi-Chon Lei, Chen-Yang Shi, Seak-Weng Vong. An inexact alternating direction method of multipliers for a kind of nonlinear complementarity problems. Numerical Algebra, Control and Optimization, 2021, 11 (3) : 353-362. doi: 10.3934/naco.2020030

[7]

Foxiang Liu, Lingling Xu, Yuehong Sun, Deren Han. A proximal alternating direction method for multi-block coupled convex optimization. Journal of Industrial and Management Optimization, 2019, 15 (2) : 723-737. doi: 10.3934/jimo.2018067

[8]

Feishe Chen, Lixin Shen, Yuesheng Xu, Xueying Zeng. The Moreau envelope approach for the L1/TV image denoising model. Inverse Problems and Imaging, 2014, 8 (1) : 53-77. doi: 10.3934/ipi.2014.8.53

[9]

Linghua Chen, Espen R. Jakobsen. L1 semigroup generation for Fokker-Planck operators associated to general Lévy driven SDEs. Discrete and Continuous Dynamical Systems, 2018, 38 (11) : 5735-5763. doi: 10.3934/dcds.2018250

[10]

Bingsheng He, Xiaoming Yuan. Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numerical Algebra, Control and Optimization, 2013, 3 (2) : 247-260. doi: 10.3934/naco.2013.3.247

[11]

Yuan Shen, Lei Ji. Partial convolution for total variation deblurring and denoising by new linearized alternating direction method of multipliers with extension step. Journal of Industrial and Management Optimization, 2019, 15 (1) : 159-175. doi: 10.3934/jimo.2018037

[12]

Feng Ma, Jiansheng Shu, Yaxiong Li, Jian Wu. The dual step size of the alternating direction method can be larger than 1.618 when one function is strongly convex. Journal of Industrial and Management Optimization, 2021, 17 (3) : 1173-1185. doi: 10.3934/jimo.2020016

[13]

Yue Lu, Ying-En Ge, Li-Wei Zhang. An alternating direction method for solving a class of inverse semi-definite quadratic programming problems. Journal of Industrial and Management Optimization, 2016, 12 (1) : 317-336. doi: 10.3934/jimo.2016.12.317

[14]

Yan Gu, Nobuo Yamashita. Alternating direction method of multipliers with variable metric indefinite proximal terms for convex optimization. Numerical Algebra, Control and Optimization, 2020, 10 (4) : 487-510. doi: 10.3934/naco.2020047

[15]

Zhongming Wu, Xingju Cai, Deren Han. Linearized block-wise alternating direction method of multipliers for multiple-block convex programming. Journal of Industrial and Management Optimization, 2018, 14 (3) : 833-855. doi: 10.3934/jimo.2017078

[16]

Yingying Li, Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging, 2009, 3 (3) : 487-503. doi: 10.3934/ipi.2009.3.487

[17]

Junying Hu, Xiaofei Qian, Jun Pei, Changchun Tan, Panos M. Pardalos, Xinbao Liu. A novel quality prediction method based on feature selection considering high dimensional product quality data. Journal of Industrial and Management Optimization, 2022, 18 (4) : 2977-3000. doi: 10.3934/jimo.2021099

[18]

Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2018  doi: 10.3934/bdia.2017020

[19]

Gernot Holler, Karl Kunisch. Learning nonlocal regularization operators. Mathematical Control and Related Fields, 2022, 12 (1) : 81-114. doi: 10.3934/mcrf.2021003

[20]

David E. Bernholdt, Mark R. Cianciosa, Clement Etienam, David L. Green, Kody J. H. Law, Jin M. Park. Corrigendum to "Cluster, classify, regress: A general method for learning discontinuous functions [1]". Foundations of Data Science, 2020, 2 (1) : 81-81. doi: 10.3934/fods.2020005

2021 Impact Factor: 1.411

Metrics

  • PDF downloads (314)
  • HTML views (770)
  • Cited by (0)

Other articles
by authors

[Back to Top]