• PDF
• Cite
• Share
Article Contents  Article Contents

# Stochastic AUC optimization with general loss

• * Corresponding author
This work was completed when Wei Shen was a visiting student at SUNY Albany. Yiming Ying is supported by the National Science Foundation (NSF, Grant IIS1816227)
• Recently, there is considerable work on developing efficient stochastic optimization algorithms for AUC maximization. However, most of them focus on the least square loss which may be not the best option in practice. The main difficulty for dealing with the general convex loss is the pairwise nonlinearity w.r.t. the sampling distribution generating the data. In this paper, we use Bernstein polynomials to uniformly approximate the general losses which are able to decouple the pairwise nonlinearity. In particular, we show that this reduction for AUC maximization with a general loss is equivalent to a weakly convex (nonconvex) min-max formulation. Then, we develop a novel SGD algorithm for AUC maximization with per-iteration cost linearly w.r.t. the data dimension, making it amenable for streaming data analysis. Despite its non-convexity, we prove its global convergence by exploring the appealing convexity-preserving property of Bernstein polynomials and the intrinsic structure of the min-max formulation. Experiments are performed to validate the effectiveness of the proposed approach.

Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

 Citation: • • Figure 1.  Comparison of convergence speed between SAUC-H and $\text{OAM}_{gra}$

Figure 2.  Evaluation of AUC scores vesus the degree of the Bernstein polynomial

 Algorithm 1: Stochastic AUC Optimization (SAUC) 1: Input: $R>0$, $\gamma\geq\gamma_0$ and $\beta>0$. 2: Initialize $\bar{{\mathbf{v}}}_0 = 0$ and $\bar{{\mathit{\boldsymbol{\alpha}}}}_0 = 0$. 3: for $t=1$ to $T-1$ do4: Set ${\mathbf{v}}_0^t = \bar{{\mathbf{v}}}_{t-1}, {\mathit{\boldsymbol{\alpha}}}_0^t = \bar{{\mathit{\boldsymbol{\alpha}}}}_{t-1}$ and $\eta_t = \frac{\beta}{\sqrt{t}}.$ 5: for $j=1$ to $t$ do 6: Randomly sample $z_j^t = (x_j^t,y_j^t)$ and compute \begin{align*} &{\mathbf{v}}_{j}^t = {{\bf Proj}}_{{\Omega}_1} \bigl({\mathbf{v}}_{j-1}^t - \eta_t \nabla_{{\mathbf{v}}} \varPhi_{\gamma}^t({\mathbf{v}}_{j-1}^t,{\mathit{\boldsymbol{\alpha}}}_{j-1}^t;z_j^t)\bigr), &{\mathit{\boldsymbol{\alpha}}}_{j}^t = {{\bf Proj}}_{{\Omega}_2} \bigl({\mathit{\boldsymbol{\alpha}}}_{j-1}^t + \eta_t \nabla_{{\mathit{\boldsymbol{\alpha}}}} \varPhi_{\gamma}^t({\mathbf{v}}_{j-1}^t,{\mathit{\boldsymbol{\alpha}}}_{j-1}^t;z_j^t)\bigr) \end{align*} 7: end for 8: Compute $\bar{{\mathbf{v}}}_{t} = \frac{1}{t}\sum_{j=0}^{t-1} {\mathbf{v}}_j^t$ and $\bar{{\mathit{\boldsymbol{\alpha}}}}_{t} = \frac{1}{t}\sum_{j=0}^{t-1} {\mathit{\boldsymbol{\alpha}}}_j^t.$9: end for 10: Output: $\widetilde{{\mathbf{v}}}_T:=\frac{1}{T}\sum_{t=0}^{T-1}\bar{{\mathbf{v}}}_{t}$ and $\widetilde{{\mathit{\boldsymbol{\alpha}}}}_T:=\frac{1}{T}\sum_{t=0}^{T-1}\bar{{\mathit{\boldsymbol{\alpha}}}}_{t}.$

Table 1.  Statistics of datasets Table 2.  Comparison of AUC score (mean$\pm$std) on test data; OPAUC on news20 and sector does not converge in a reasonable time limit. Best AUC value on each dataset is in bold and second is underlined • Figures(2)

Tables(3)

## Article Metrics  DownLoad:  Full-Size Img  PowerPoint