\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

The approximation algorithm based on seeding method for functional $ k $-means problem

  • * Corresponding author: Dongmei Zhang

    * Corresponding author: Dongmei Zhang

A preliminary version of this article appeared in Proceedings of COCOON 2019, pp. 387-396.

Abstract Full Text(HTML) Figure(0) / Table(2) Related Papers Cited by
  • Different from the classical $ k $-means problem, the functional $ k $-means problem involves a kind of dynamic data, which is generated by continuous processes. In this paper, we mainly design an $ O(\ln\; k) $-approximation algorithm based on the seeding method for functional $ k $-means problem. Moreover, the numerical experiment presented shows that this algorithm is more efficient than the functional $ k $-means clustering algorithm.

    Mathematics Subject Classification: Primary: 90C27; Secondary: 68W25.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Table 1.  Notation Index

    NotationsMeaning of symbols
    $x_i(t)$ or $x_i^j(t)$functional curve
    $X(t)$ or $X^i(t)$ or $C^i(t)$$d$-dimensional functional sample with functional curves as its components
    $\mathfrak{F}^d(t)$set of $d$-dimensional functional samples
    $\Gamma(t)$ or $\Delta(t)$ or $C(t)$subset of $\mathfrak{F}^d(t)$
    $\mu(\Gamma(t))$center of mass of functional samples of $\Gamma(t)$
    $d(X^i(t), X^j(t))$distance between the functional samples $X^i(t)$ and $X^j(t)$
    $d(X^i(t), \Gamma(t))$distance from the functional sample $X^i(t)$ to the subset of functional samples $\Gamma(t)$
    $X(t)_{\Gamma(t)}$the closest functional sample in $\Gamma(t)$ to the functional sample $X(t)$
    $\Phi(\Gamma(t), C(t))$potential function of $\Gamma(t)$ over $C(t)$
    $\Gamma_{C(t)}^i(t)$the $i$-th cluster of $\Gamma(t)$ with respect to $C(t)$
     | Show Table
    DownLoad: CSV

    Table 2.  Comparison of Algorithm 1 and the functional $ k $-means algorithm in [15]

    Data SetMethodARIDBIInitial CostReturned CostTime (s)
    SimudataSeedAlg0.79190.86523338168958
    FuncAlg0.79750.86565465168961
    SdataSeedAlg0.66511.13851667712192
    FuncAlg0.65611.13842485720217
     | Show Table
    DownLoad: CSV
  • [1] C. AbrahamP. A. CornillonE. Matzner-Løber and N. Molinari, Unsupervised curve clustering using B-splines, Scandinavian Journal of Statistics, 30 (2003), 581-595.  doi: 10.1111/1467-9469.00350.
    [2] S. Ahmadian, A. Norouzi-Fard, O. Svensson and J. Ward, Better guarantees for $k$-means and Euclidean $k$-median by primal-dual algorithms, SIAM Journal on Computing, (2019), FOCS17-97–FOCS17-156. doi: 10.1137/18M1171321.
    [3] D. AloiseA. DeshpandeP. Hansen and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, 75 (2009), 245-248.  doi: 10.1007/s10994-009-5103-0.
    [4] D. Arthur and S. Vassilvitskii, $K$-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, (2007), 1027–1035.
    [5] M. Boullé, Functional data clustering via piecewise constant nonparametric density estimation, Pattern Recognition, 45 (2012), 4389-4401. 
    [6] C. Bouveyron and C. Brunet-Saumard, Model-based clustering of high-dimensional data: A review, Computational Statistics & Data Analysis, 71 (2014), 52-78.  doi: 10.1016/j.csda.2012.12.008.
    [7] R. Gamasaee and M. Zarandi, A new dirichlet process for mining dynamic patterns in functional data, Information Sciences, 405 (2017), 55-80.  doi: 10.1016/j.ins.2017.04.008.
    [8] S. Har-Peled and B. Sadri, How fast is the $k$-means method?, Algorithmica, 41 (2005), 185-202.  doi: 10.1007/s00453-004-1127-9.
    [9] J. Jacques and C. Preda, Functional data clustering: A survey, Advances in Data Analysis and Classification, 8 (2014), 231-255.  doi: 10.1007/s11634-013-0158-y.
    [10] S. Ji, D. Xu, L. Guo, M. Li and D. Zhang, The seeding algorithm for spherical $k$-means clustering with penalties, Journal of Combinatorial Optimization, 2020. doi: 10.1007/s10878-020-00569-1.
    [11] M. KayanoK. Dozono and S. Konishi, Functional cluster analysis via orthonormalized Gaussian basis expansions and its application, Journal of Classification, 27 (2010), 211-230.  doi: 10.1007/s00357-010-9054-8.
    [12] M. Li, The bi-criteria seeding algorithms for two variants of $k$-means problem, Journal of Combinatorial Optimization, 2020. doi: 10.1007/s10878-020-00537-9.
    [13] M. LiD. XuJ. YueD. Zhang and P. Zhang, The seeding algorithm for $k$-means problem with penalties, Journal of Combinatorial Optimization, 39 (2020), 15-32.  doi: 10.1007/s10878-019-00450-w.
    [14] S. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, 28 (1982), 129-137.  doi: 10.1109/TIT.1982.1056489.
    [15] Y. MengJ. LiangF. Cao and Y. He, A new distance with derivative information for functional $k$-means clustering algorithm, Information Sciences, 463/464 (2018), 166-185.  doi: 10.1016/j.ins.2018.06.035.
    [16] R. Ostrovsky, Y. Rabani, L. Schulman and C. Swamy, The effectiveness of Lloyd-type methods for the $k$-means problem, Journal of the ACM, 59 (2012), 28: 1–28: 22. doi: 10.1145/2395116.2395117.
    [17] G. Ozturk and M. Ciftci, Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial & Management Optimization, 11 (2015), 921-932.  doi: 10.3934/jimo.2015.11.921.
    [18] J. Park and J. Ahn, Clustering multivariate functional data with phase variation, Biometrics, 73 (2017), 324-333.  doi: 10.1111/biom.12546.
    [19] J. Peng and H. G. Müller, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, The Annals of Applied Statistics, 2 (2008), 1056-1077.  doi: 10.1214/08-AOAS172.
    [20] C. PredaG. Saporta and C. Lévéder, PLS classification of functional data, Computational Statistics, 22 (2007), 223-235.  doi: 10.1007/s00180-007-0041-4.
    [21] T. Tarpey and K. K. Kinateder, Clustering functional data, Journal of Classification, 20 (2003), 93-114.  doi: 10.1007/s00357-003-0007-3.
    [22] D. Wei, A constant-factor bi-criteria approximation guarantee for $k$-means++, Proceedings of the Thirtieth International Conference on Neural Information Processing Systems, (2016), 604–612.
    [23] X. WuV. KumarJ. QuinlanJ. Ross GhoshQ. YangH. MotodaG. J. McLachlanA. NgB. LiuP.S. YuZ. H. ZhouM. SteinbachD. J. Hand and D. Steinberg, Top 10 algorithms in data mining, Knowledge and Information Systems, 14 (2008), 1-37.  doi: 10.1007/s10115-007-0114-2.
  • 加载中

Tables(2)

SHARE

Article Metrics

HTML views(868) PDF downloads(411) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return