# American Institute of Mathematical Sciences

October  2012, 8(4): 1057-1069. doi: 10.3934/jimo.2012.8.1057

## A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning

 1 Institute of Applied Mathematics, Henan University, Kaifeng 475004 2 National Center for Theoretical Sciences (South), National Cheng Kung University, Tainan 700, Taiwan 3 Department of Mathematics, Nanjing University, Nanjing 210093, China

Received  March 2011 Revised  January 2012 Published  September 2012

The joint feature selection problem arises in many fields including computer vision, text classification and biomedical informatics. Generally, recent results show that it can be realized by solving a $\ell_{2,1}$-norm involved minimization problem. However, solving the optimization problem is a challenging task due to the non-smoothness of the regularization term. In this paper, we reformulate the problem to an equivalent constrained minimization problem by introducing an auxiliary variable. We split the corresponding augmented Lagrange function and minimize the subproblem alternatively with one variable by fixing the other one. Moreover, we linearize the subproblem and add a proximal-point term when the closed-form solutions are not easily to derived. The convergence analysis and the relatedness with other algorithms are also given. Although the $\ell_{2,1}$-norm is mainly considered, we show that the $\ell_{\infty,1}$-norm penalized learning problem can also be readily solved in our framework. The reported experiments on simulated and real data sets show that the proposed method is effective and promising. The performance comparisons illustrate that the proposed algorithm is competitive with even performs little better than the state-of-the-art solver SLEP.
Citation: Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial & Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057
##### References:
 [1] R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata,, Journal of Machine Learning Research, 6 (2005), 1817.   Google Scholar [2] A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning,, Machine Learning, 73 (2008), 243.   Google Scholar [3] B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning,, Journal of Machine Learning Research, 4 (2003), 83.   Google Scholar [4] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit,, SIAM Journal on Scientific Computing, 20 (1999), 33.  doi: 10.1137/S1064827596304010.  Google Scholar [5] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting,, Journal of Machine Learning Research, 10 (2009), 2899.   Google Scholar [6] T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods,, Journal of Machine Learning Research, 6 (2005), 615.   Google Scholar [7] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations,, Computers & Mathematics with Applications, 2 (1976), 17.   Google Scholar [8] R. Glowinski, "Numerical Methods for Nonlinear Variational Problems,", Springer, (1984).   Google Scholar [9] R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires,, Revue Francaise d'automatique, 2 (1975), 41.   Google Scholar [10] B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities,, Mathematical Programming, 92 (2002), 103.  doi: 10.1007/s101070100280.  Google Scholar [11] B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities,, Journal of Computational Mathematics, 21 (2003), 495.   Google Scholar [12] J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression,", in, (2009).   Google Scholar [13] J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization,", in, (2009).   Google Scholar [14] M. Kowalski, Sparse regression using mixednorms,, Applied and Computational Harmonic Analysis, 27 (2009), 303.  doi: 10.1016/j.acha.2009.05.006.  Google Scholar [15] M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization,", Proceedings of the 26th Annual International Conference on Machine Learning, (2009).   Google Scholar [16] A. Nemirovski, "Efficient Methods in Convex Programming,", Lecture Notes, (1994).   Google Scholar [17] Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course,", Kluwer Academic Publishers, (2003).   Google Scholar [18] Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function,", CORE report, (2007).   Google Scholar [19] F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization,", Neural Information Processing Systems Foundation, (2010).   Google Scholar [20] G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection,", Technical Report, (2006).   Google Scholar [21] Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics,, Bioinformatics, 23 (2007), 2507.  doi: 10.1093/bioinformatics/btm344.  Google Scholar [22] Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations,, Adv. Comput. Math., (): 10444.   Google Scholar [23] T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning,", in, (2006).   Google Scholar [24] M. H. Xu, Proximal alternating directions method for structured variational inequalities,, Journal of Optimization Theory and Applications, 134 (2007), 107.  doi: 10.1007/s10957-007-9192-2.  Google Scholar [25] J. Yang, Dynamic power price problem: An inverse variational inequality approach,, Journal of Industrial and Management Optimization, 4 (2008), 673.   Google Scholar [26] J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,, Math. Comput., ().  doi: 10.1090/S0025-5718-2012-02598-1.  Google Scholar [27] J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing,, SIAM Journal on Scientific Computing, 33 (2011), 250.  doi: 10.1137/090777761.  Google Scholar [28] J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning,, Machine Learning, 73 (2008), 221.   Google Scholar

show all references

##### References:
 [1] R. K. Ando and T. Zhang, A framework for learning predictive structures from multiple tasks and unlabeleddata,, Journal of Machine Learning Research, 6 (2005), 1817.   Google Scholar [2] A. Argyriou, T. Evgeniou and M. Pontil, Convex multi-convex feature learning,, Machine Learning, 73 (2008), 243.   Google Scholar [3] B. Bakker and T. Heskes, Task clustering and gating for Bayesian multi-task learning,, Journal of Machine Learning Research, 4 (2003), 83.   Google Scholar [4] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit,, SIAM Journal on Scientific Computing, 20 (1999), 33.  doi: 10.1137/S1064827596304010.  Google Scholar [5] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting,, Journal of Machine Learning Research, 10 (2009), 2899.   Google Scholar [6] T. Evgeniou, C. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods,, Journal of Machine Learning Research, 6 (2005), 615.   Google Scholar [7] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations,, Computers & Mathematics with Applications, 2 (1976), 17.   Google Scholar [8] R. Glowinski, "Numerical Methods for Nonlinear Variational Problems,", Springer, (1984).   Google Scholar [9] R. Glowinski and A. Marrocco, Sur l'approximation, par élémentsfinis d'ordre un, et la résolution, parpénalisation-dualité d'une classe de problèmes deDirichlet nonlinéaires,, Revue Francaise d'automatique, 2 (1975), 41.   Google Scholar [10] B. He, L. Z. Liao, D. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities,, Mathematical Programming, 92 (2002), 103.  doi: 10.1007/s101070100280.  Google Scholar [11] B. He, S. L. Wang and H. Yang, A modified variable-penalty alternating directions method for monotone variational inequalities,, Journal of Computational Mathematics, 21 (2003), 495.   Google Scholar [12] J. Liu, J. Chen and J. Ye, "Large-Scale Sparse Logistic Regression,", in, (2009).   Google Scholar [13] J. Liu, S. Ji and J. Ye, "Multi-Task Feather Learning Via Efficient $l_{2,1}$-norm Minimization,", in, (2009).   Google Scholar [14] M. Kowalski, Sparse regression using mixednorms,, Applied and Computational Harmonic Analysis, 27 (2009), 303.  doi: 10.1016/j.acha.2009.05.006.  Google Scholar [15] M. Kowalski, M. Szafranski and L. Ralaivola, "Multiple Indefinite Kernel Learning with Mixed Normregularization,", Proceedings of the 26th Annual International Conference on Machine Learning, (2009).   Google Scholar [16] A. Nemirovski, "Efficient Methods in Convex Programming,", Lecture Notes, (1994).   Google Scholar [17] Y. Nesterov, "Introductory Lectures on Convex Optimization: A Basic Course,", Kluwer Academic Publishers, (2003).   Google Scholar [18] Y. Nesterov, "Gradient Methods for Minimizing Composite Objective Function,", CORE report, (2007).   Google Scholar [19] F. Nie, H. Huang, X. Cai and C. Ding, "Efficient and Robust Feature Selection via Joint $l_{2,1}$-Normsminimization,", Neural Information Processing Systems Foundation, (2010).   Google Scholar [20] G. Obozinski, B. Taskar and M. I. Jordan, "Multi-Task Feature Selection,", Technical Report, (2006).   Google Scholar [21] Y. Saeys, I. Inza and P. Larranaga, A review of feature selection techniques in bioinformatics,, Bioinformatics, 23 (2007), 2507.  doi: 10.1093/bioinformatics/btm344.  Google Scholar [22] Y. Xiao, S.-Y. Wu and D.-H. Li, Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations,, Adv. Comput. Math., (): 10444.   Google Scholar [23] T. Xiong, J. Bi, B. Rao and V. Cherkassky, "Probabilistic Joint Feature Selection for Multi-Task Learning,", in, (2006).   Google Scholar [24] M. H. Xu, Proximal alternating directions method for structured variational inequalities,, Journal of Optimization Theory and Applications, 134 (2007), 107.  doi: 10.1007/s10957-007-9192-2.  Google Scholar [25] J. Yang, Dynamic power price problem: An inverse variational inequality approach,, Journal of Industrial and Management Optimization, 4 (2008), 673.   Google Scholar [26] J. Yang and X. Yuan, Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization,, Math. Comput., ().  doi: 10.1090/S0025-5718-2012-02598-1.  Google Scholar [27] J. Yang and Y. Zhang, Alternating direction algorithms for $l_1$-problemsin compressive sensing,, SIAM Journal on Scientific Computing, 33 (2011), 250.  doi: 10.1137/090777761.  Google Scholar [28] J. Zhang, Z. Ghahramani and Y. Yang, Flexible latent variable models for multi-task learning,, Machine Learning, 73 (2008), 221.   Google Scholar
 [1] Marion Darbas, Jérémy Heleine, Stephanie Lohrengel. Numerical resolution by the quasi-reversibility method of a data completion problem for Maxwell's equations. Inverse Problems & Imaging, 2020, 14 (6) : 1107-1133. doi: 10.3934/ipi.2020056 [2] Frank Sottile. The special Schubert calculus is real. Electronic Research Announcements, 1999, 5: 35-39. [3] Yuri Chekanov, Felix Schlenk. Notes on monotone Lagrangian twist tori. Electronic Research Announcements, 2010, 17: 104-121. doi: 10.3934/era.2010.17.104 [4] Johannes Kellendonk, Lorenzo Sadun. Conjugacies of model sets. Discrete & Continuous Dynamical Systems - A, 2017, 37 (7) : 3805-3830. doi: 10.3934/dcds.2017161 [5] Sandrine Anthoine, Jean-François Aujol, Yannick Boursier, Clothilde Mélot. Some proximal methods for Poisson intensity CBCT and PET. Inverse Problems & Imaging, 2012, 6 (4) : 565-598. doi: 10.3934/ipi.2012.6.565 [6] Gioconda Moscariello, Antonia Passarelli di Napoli, Carlo Sbordone. Planar ACL-homeomorphisms : Critical points of their components. Communications on Pure & Applied Analysis, 2010, 9 (5) : 1391-1397. doi: 10.3934/cpaa.2010.9.1391 [7] Dmitry Treschev. A locally integrable multi-dimensional billiard system. Discrete & Continuous Dynamical Systems - A, 2017, 37 (10) : 5271-5284. doi: 10.3934/dcds.2017228 [8] Sara Munday. On the derivative of the $\alpha$-Farey-Minkowski function. Discrete & Continuous Dynamical Systems - A, 2014, 34 (2) : 709-732. doi: 10.3934/dcds.2014.34.709 [9] Habib Ammari, Josselin Garnier, Vincent Jugnon. Detection, reconstruction, and characterization algorithms from noisy data in multistatic wave imaging. Discrete & Continuous Dynamical Systems - S, 2015, 8 (3) : 389-417. doi: 10.3934/dcdss.2015.8.389 [10] Alexander A. Davydov, Massimo Giulietti, Stefano Marcugini, Fernanda Pambianco. Linear nonbinary covering codes and saturating sets in projective spaces. Advances in Mathematics of Communications, 2011, 5 (1) : 119-147. doi: 10.3934/amc.2011.5.119 [11] Ralf Hielscher, Michael Quellmalz. Reconstructing a function on the sphere from its means along vertical slices. Inverse Problems & Imaging, 2016, 10 (3) : 711-739. doi: 10.3934/ipi.2016018 [12] Hong Seng Sim, Wah June Leong, Chuei Yee Chen, Siti Nur Iqmal Ibrahim. Multi-step spectral gradient methods with modified weak secant relation for large scale unconstrained optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 377-387. doi: 10.3934/naco.2018024 [13] A. Kochergin. Well-approximable angles and mixing for flows on T^2 with nonsingular fixed points. Electronic Research Announcements, 2004, 10: 113-121. [14] Armin Lechleiter, Tobias Rienmüller. Factorization method for the inverse Stokes problem. Inverse Problems & Imaging, 2013, 7 (4) : 1271-1293. doi: 10.3934/ipi.2013.7.1271 [15] Charles Fulton, David Pearson, Steven Pruess. Characterization of the spectral density function for a one-sided tridiagonal Jacobi matrix operator. Conference Publications, 2013, 2013 (special) : 247-257. doi: 10.3934/proc.2013.2013.247 [16] Emma D'Aniello, Saber Elaydi. The structure of $\omega$-limit sets of asymptotically non-autonomous discrete dynamical systems. Discrete & Continuous Dynamical Systems - B, 2020, 25 (3) : 903-915. doi: 10.3934/dcdsb.2019195 [17] Qiang Guo, Dong Liang. An adaptive wavelet method and its analysis for parabolic equations. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 327-345. doi: 10.3934/naco.2013.3.327 [18] Tao Wu, Yu Lei, Jiao Shi, Maoguo Gong. An evolutionary multiobjective method for low-rank and sparse matrix decomposition. Big Data & Information Analytics, 2017, 2 (1) : 23-37. doi: 10.3934/bdia.2017006 [19] Deren Han, Zehui Jia, Yongzhong Song, David Z. W. Wang. An efficient projection method for nonlinear inverse problems with sparsity constraints. Inverse Problems & Imaging, 2016, 10 (3) : 689-709. doi: 10.3934/ipi.2016017 [20] Boris Kramer, John R. Singler. A POD projection method for large-scale algebraic Riccati equations. Numerical Algebra, Control & Optimization, 2016, 6 (4) : 413-435. doi: 10.3934/naco.2016018

2019 Impact Factor: 1.366