October  2012, 8(4): 1057-1069. doi: 10.3934/jimo.2012.8.1057

A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning

1. 

Institute of Applied Mathematics, Henan University, Kaifeng 475004

2. 

National Center for Theoretical Sciences (South), National Cheng Kung University, Tainan 700, Taiwan

3. 

Department of Mathematics, Nanjing University, Nanjing 210093, China

Received  March 2011 Revised  January 2012 Published  September 2012

The joint feature selection problem arises in many fields including computer vision, text classification and biomedical informatics. Generally, recent results show that it can be realized by solving a $\ell_{2,1}$-norm involved minimization problem. However, solving the optimization problem is a challenging task due to the non-smoothness of the regularization term. In this paper, we reformulate the problem to an equivalent constrained minimization problem by introducing an auxiliary variable. We split the corresponding augmented Lagrange function and minimize the subproblem alternatively with one variable by fixing the other one. Moreover, we linearize the subproblem and add a proximal-point term when the closed-form solutions are not easily to derived. The convergence analysis and the relatedness with other algorithms are also given. Although the $\ell_{2,1}$-norm is mainly considered, we show that the $\ell_{\infty,1}$-norm penalized learning problem can also be readily solved in our framework. The reported experiments on simulated and real data sets show that the proposed method is effective and promising. The performance comparisons illustrate that the proposed algorithm is competitive with even performs little better than the state-of-the-art solver SLEP.
Citation: Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial and Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057
References:
[1]

R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata, Journal of Machine Learning Research, 6 (2005), 1817-1853.

[2]

A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning, Machine Learning, 73 (2008), 243-272.

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning, Journal of Machine Learning Research, 4 (2003), 83-99.

[4]

S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing, 20 (1999), 33-61. doi: 10.1137/S1064827596304010.

[5]

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, 10 (2009), 2899-2934.

[6]

T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, Journal of Machine Learning Research, 6 (2005), 615-637.

[7]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Computers & Mathematics with Applications, 2 (1976), 17-40.

[8]

R. Glowinski, "Numerical Methods for Nonlinear Variational Problems," Springer, New York, 1984.

[9]

R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires, Revue Francaise d'automatique, informatique, recherche opéretionnelle. Analyse numérique, 2 (1975), 41-76.

[10]

B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Mathematical Programming, 92 (2002), 103-118. doi: 10.1007/s101070100280.

[11]

B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities, Journal of Computational Mathematics, 21 (2003), 495-504.

[12]

J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression," in "ACM SIGKDD International Conference On KnowledgeDiscovery and Data Mining", 2009.

[13]

J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization," in "Comference on Uncertainty in Artificial Intelligence", 2009.

[14]

M. Kowalski, Sparse regression using mixednorms, Applied and Computational Harmonic Analysis, 27 (2009), 303-324. doi: 10.1016/j.acha.2009.05.006.

[15]

M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization," Proceedings of the 26th Annual International Conference on Machine Learning, 2009.

[16]

A. Nemirovski, "Efficient Methods in Convex Programming," Lecture Notes, 1994.

[17]

Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course," Kluwer Academic Publishers, 2003.

[18]

Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function," CORE report, 2007; available at http://www.ecore.be/DPs/dp_1191313936.pdf.

[19]

F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization," Neural Information Processing Systems Foundation, 2010.

[20]

G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection," Technical Report, UC Berkeley, 2006.

[21]

Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics, Bioinformatics, 23 (2007), 2507-2517. doi: 10.1093/bioinformatics/btm344.

[22]

Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations, Adv. Comput. Math., DOI 10.1007/s10444-011-9261-9.

[23]

T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning," in "SIAM International Conference on Data Mining", 2006.

[24]

M. H. Xu, Proximal alternating directions method for structured variational inequalities, Journal of Optimization Theory and Applications, 134 (2007), 107-117. doi: 10.1007/s10957-007-9192-2.

[25]

J. Yang, Dynamic power price problem: An inverse variational inequality approach, Journal of Industrial and Management Optimization, 4 (2008), 673-684.

[26]

J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization, Math. Comput. doi: 10.1090/S0025-5718-2012-02598-1.

[27]

J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing, SIAM Journal on Scientific Computing, 33 (2011), 250-278. doi: 10.1137/090777761.

[28]

J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning, Machine Learning, 73 (2008), 221-242.

show all references

References:
[1]

R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata, Journal of Machine Learning Research, 6 (2005), 1817-1853.

[2]

A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning, Machine Learning, 73 (2008), 243-272.

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning, Journal of Machine Learning Research, 4 (2003), 83-99.

[4]

S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing, 20 (1999), 33-61. doi: 10.1137/S1064827596304010.

[5]

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, 10 (2009), 2899-2934.

[6]

T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, Journal of Machine Learning Research, 6 (2005), 615-637.

[7]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Computers & Mathematics with Applications, 2 (1976), 17-40.

[8]

R. Glowinski, "Numerical Methods for Nonlinear Variational Problems," Springer, New York, 1984.

[9]

R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires, Revue Francaise d'automatique, informatique, recherche opéretionnelle. Analyse numérique, 2 (1975), 41-76.

[10]

B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Mathematical Programming, 92 (2002), 103-118. doi: 10.1007/s101070100280.

[11]

B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities, Journal of Computational Mathematics, 21 (2003), 495-504.

[12]

J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression," in "ACM SIGKDD International Conference On KnowledgeDiscovery and Data Mining", 2009.

[13]

J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization," in "Comference on Uncertainty in Artificial Intelligence", 2009.

[14]

M. Kowalski, Sparse regression using mixednorms, Applied and Computational Harmonic Analysis, 27 (2009), 303-324. doi: 10.1016/j.acha.2009.05.006.

[15]

M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization," Proceedings of the 26th Annual International Conference on Machine Learning, 2009.

[16]

A. Nemirovski, "Efficient Methods in Convex Programming," Lecture Notes, 1994.

[17]

Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course," Kluwer Academic Publishers, 2003.

[18]

Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function," CORE report, 2007; available at http://www.ecore.be/DPs/dp_1191313936.pdf.

[19]

F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization," Neural Information Processing Systems Foundation, 2010.

[20]

G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection," Technical Report, UC Berkeley, 2006.

[21]

Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics, Bioinformatics, 23 (2007), 2507-2517. doi: 10.1093/bioinformatics/btm344.

[22]

Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations, Adv. Comput. Math., DOI 10.1007/s10444-011-9261-9.

[23]

T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning," in "SIAM International Conference on Data Mining", 2006.

[24]

M. H. Xu, Proximal alternating directions method for structured variational inequalities, Journal of Optimization Theory and Applications, 134 (2007), 107-117. doi: 10.1007/s10957-007-9192-2.

[25]

J. Yang, Dynamic power price problem: An inverse variational inequality approach, Journal of Industrial and Management Optimization, 4 (2008), 673-684.

[26]

J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization, Math. Comput. doi: 10.1090/S0025-5718-2012-02598-1.

[27]

J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing, SIAM Journal on Scientific Computing, 33 (2011), 250-278. doi: 10.1137/090777761.

[28]

J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning, Machine Learning, 73 (2008), 221-242.

[1]

Foxiang Liu, Lingling Xu, Yuehong Sun, Deren Han. A proximal alternating direction method for multi-block coupled convex optimization. Journal of Industrial and Management Optimization, 2019, 15 (2) : 723-737. doi: 10.3934/jimo.2018067

[2]

Yan Gu, Nobuo Yamashita. Alternating direction method of multipliers with variable metric indefinite proximal terms for convex optimization. Numerical Algebra, Control and Optimization, 2020, 10 (4) : 487-510. doi: 10.3934/naco.2020047

[3]

Xueyong Wang, Yiju Wang, Gang Wang. An accelerated augmented Lagrangian method for multi-criteria optimization problem. Journal of Industrial and Management Optimization, 2020, 16 (1) : 1-9. doi: 10.3934/jimo.2018136

[4]

Feng Ma, Jiansheng Shu, Yaxiong Li, Jian Wu. The dual step size of the alternating direction method can be larger than 1.618 when one function is strongly convex. Journal of Industrial and Management Optimization, 2021, 17 (3) : 1173-1185. doi: 10.3934/jimo.2020016

[5]

Fan Zhang, Guifa Teng, Mengmeng Gao, Shuai Zhang, Jingjing Zhang. Multi-machine and multi-task emergency allocation algorithm based on precedence rules. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1501-1513. doi: 10.3934/dcdss.2019103

[6]

Russell E. Warren, Stanley J. Osher. Hyperspectral unmixing by the alternating direction method of multipliers. Inverse Problems and Imaging, 2015, 9 (3) : 917-933. doi: 10.3934/ipi.2015.9.917

[7]

Qian Liu, Xinmin Yang, Heung Wing Joseph Lee. On saddle points of a class of augmented lagrangian functions. Journal of Industrial and Management Optimization, 2007, 3 (4) : 693-700. doi: 10.3934/jimo.2007.3.693

[8]

Sohana Jahan. Supervised distance preserving projection using alternating direction method of multipliers. Journal of Industrial and Management Optimization, 2020, 16 (4) : 1783-1799. doi: 10.3934/jimo.2019029

[9]

Jie-Wen He, Chi-Chon Lei, Chen-Yang Shi, Seak-Weng Vong. An inexact alternating direction method of multipliers for a kind of nonlinear complementarity problems. Numerical Algebra, Control and Optimization, 2021, 11 (3) : 353-362. doi: 10.3934/naco.2020030

[10]

Yuan Shen, Wenxing Zhang, Bingsheng He. Relaxed augmented Lagrangian-based proximal point algorithms for convex optimization with linear constraints. Journal of Industrial and Management Optimization, 2014, 10 (3) : 743-759. doi: 10.3934/jimo.2014.10.743

[11]

Hssaine Boualam, Ahmed Roubi. Augmented Lagrangian dual for nonconvex minimax fractional programs and proximal bundle algorithms for its resolution. Journal of Industrial and Management Optimization, 2022  doi: 10.3934/jimo.2022100

[12]

Li Jin, Hongying Huang. Differential equation method based on approximate augmented Lagrangian for nonlinear programming. Journal of Industrial and Management Optimization, 2020, 16 (5) : 2267-2281. doi: 10.3934/jimo.2019053

[13]

Chunlin Wu, Juyong Zhang, Xue-Cheng Tai. Augmented Lagrangian method for total variation restoration with non-quadratic fidelity. Inverse Problems and Imaging, 2011, 5 (1) : 237-261. doi: 10.3934/ipi.2011.5.237

[14]

Wei Zhu, Xue-Cheng Tai, Tony Chan. Augmented Lagrangian method for a mean curvature based image denoising model. Inverse Problems and Imaging, 2013, 7 (4) : 1409-1432. doi: 10.3934/ipi.2013.7.1409

[15]

Fatemeh Bazikar, Saeed Ketabchi, Hossein Moosaei. Smooth augmented Lagrangian method for twin bounded support vector machine. Numerical Algebra, Control and Optimization, 2021  doi: 10.3934/naco.2021027

[16]

Bingsheng He, Xiaoming Yuan. Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numerical Algebra, Control and Optimization, 2013, 3 (2) : 247-260. doi: 10.3934/naco.2013.3.247

[17]

Yuan Shen, Lei Ji. Partial convolution for total variation deblurring and denoising by new linearized alternating direction method of multipliers with extension step. Journal of Industrial and Management Optimization, 2019, 15 (1) : 159-175. doi: 10.3934/jimo.2018037

[18]

Yue Lu, Ying-En Ge, Li-Wei Zhang. An alternating direction method for solving a class of inverse semi-definite quadratic programming problems. Journal of Industrial and Management Optimization, 2016, 12 (1) : 317-336. doi: 10.3934/jimo.2016.12.317

[19]

Zhongming Wu, Xingju Cai, Deren Han. Linearized block-wise alternating direction method of multipliers for multiple-block convex programming. Journal of Industrial and Management Optimization, 2018, 14 (3) : 833-855. doi: 10.3934/jimo.2017078

[20]

Dan Li, Li-Ping Pang, Fang-Fang Guo, Zun-Quan Xia. An alternating linearization method with inexact data for bilevel nonsmooth convex optimization. Journal of Industrial and Management Optimization, 2014, 10 (3) : 859-869. doi: 10.3934/jimo.2014.10.859

2021 Impact Factor: 1.411

Metrics

  • PDF downloads (120)
  • HTML views (0)
  • Cited by (7)

Other articles
by authors

[Back to Top]