# American Institute of Mathematical Sciences

• Previous Article
Pricing and lot-sizing decisions for perishable products when demand changes by freshness
• JIMO Home
• This Issue
• Next Article
Multiobjective mathematical models and solution approaches for heterogeneous fixed fleet vehicle routing problems
doi: 10.3934/jimo.2020160

## The approximation algorithm based on seeding method for functional $k$-means problem†

 1 School of Mathematics and Statistics, Shandong Normal University, Jinan 250014, China 2 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China 3 Department of Operations Research and Information Engineering, Beijing University of Technology, Beijing 100124, China 4 School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China

* Corresponding author: Dongmei Zhang

A preliminary version of this article appeared in Proceedings of COCOON 2019, pp. 387-396.

Received  March 2020 Revised  July 2020 Published  November 2020

Different from the classical $k$-means problem, the functional $k$-means problem involves a kind of dynamic data, which is generated by continuous processes. In this paper, we mainly design an $O(\ln\; k)$-approximation algorithm based on the seeding method for functional $k$-means problem. Moreover, the numerical experiment presented shows that this algorithm is more efficient than the functional $k$-means clustering algorithm.

Citation: Min Li, Yishui Wang, Dachuan Xu, Dongmei Zhang. The approximation algorithm based on seeding method for functional $k$-means problem. Journal of Industrial & Management Optimization, doi: 10.3934/jimo.2020160
##### References:
 [1] C. Abraham, P. A. Cornillon, E. Matzner-Løber and N. Molinari, Unsupervised curve clustering using B-splines, Scandinavian Journal of Statistics, 30 (2003), 581-595.  doi: 10.1111/1467-9469.00350.  Google Scholar [2] S. Ahmadian, A. Norouzi-Fard, O. Svensson and J. Ward, Better guarantees for $k$-means and Euclidean $k$-median by primal-dual algorithms, SIAM Journal on Computing, (2019), FOCS17-97–FOCS17-156. doi: 10.1137/18M1171321.  Google Scholar [3] D. Aloise, A. Deshpande, P. Hansen and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, 75 (2009), 245-248.  doi: 10.1007/s10994-009-5103-0.  Google Scholar [4] D. Arthur and S. Vassilvitskii, $K$-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, (2007), 1027–1035.  Google Scholar [5] M. Boullé, Functional data clustering via piecewise constant nonparametric density estimation, Pattern Recognition, 45 (2012), 4389-4401.   Google Scholar [6] C. Bouveyron and C. Brunet-Saumard, Model-based clustering of high-dimensional data: A review, Computational Statistics & Data Analysis, 71 (2014), 52-78.  doi: 10.1016/j.csda.2012.12.008.  Google Scholar [7] R. Gamasaee and M. Zarandi, A new dirichlet process for mining dynamic patterns in functional data, Information Sciences, 405 (2017), 55-80.  doi: 10.1016/j.ins.2017.04.008.  Google Scholar [8] S. Har-Peled and B. Sadri, How fast is the $k$-means method?, Algorithmica, 41 (2005), 185-202.  doi: 10.1007/s00453-004-1127-9.  Google Scholar [9] J. Jacques and C. Preda, Functional data clustering: A survey, Advances in Data Analysis and Classification, 8 (2014), 231-255.  doi: 10.1007/s11634-013-0158-y.  Google Scholar [10] S. Ji, D. Xu, L. Guo, M. Li and D. Zhang, The seeding algorithm for spherical $k$-means clustering with penalties, Journal of Combinatorial Optimization, 2020. doi: 10.1007/s10878-020-00569-1.  Google Scholar [11] M. Kayano, K. Dozono and S. Konishi, Functional cluster analysis via orthonormalized Gaussian basis expansions and its application, Journal of Classification, 27 (2010), 211-230.  doi: 10.1007/s00357-010-9054-8.  Google Scholar [12] M. Li, The bi-criteria seeding algorithms for two variants of $k$-means problem, Journal of Combinatorial Optimization, 2020. doi: 10.1007/s10878-020-00537-9.  Google Scholar [13] M. Li, D. Xu, J. Yue, D. Zhang and P. Zhang, The seeding algorithm for $k$-means problem with penalties, Journal of Combinatorial Optimization, 39 (2020), 15-32.  doi: 10.1007/s10878-019-00450-w.  Google Scholar [14] S. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, 28 (1982), 129-137.  doi: 10.1109/TIT.1982.1056489.  Google Scholar [15] Y. Meng, J. Liang, F. Cao and Y. He, A new distance with derivative information for functional $k$-means clustering algorithm, Information Sciences, 463/464 (2018), 166-185.  doi: 10.1016/j.ins.2018.06.035.  Google Scholar [16] R. Ostrovsky, Y. Rabani, L. Schulman and C. Swamy, The effectiveness of Lloyd-type methods for the $k$-means problem, Journal of the ACM, 59 (2012), 28: 1–28: 22. doi: 10.1145/2395116.2395117.  Google Scholar [17] G. Ozturk and M. Ciftci, Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial & Management Optimization, 11 (2015), 921-932.  doi: 10.3934/jimo.2015.11.921.  Google Scholar [18] J. Park and J. Ahn, Clustering multivariate functional data with phase variation, Biometrics, 73 (2017), 324-333.  doi: 10.1111/biom.12546.  Google Scholar [19] J. Peng and H. G. Müller, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, The Annals of Applied Statistics, 2 (2008), 1056-1077.  doi: 10.1214/08-AOAS172.  Google Scholar [20] C. Preda, G. Saporta and C. Lévéder, PLS classification of functional data, Computational Statistics, 22 (2007), 223-235.  doi: 10.1007/s00180-007-0041-4.  Google Scholar [21] T. Tarpey and K. K. Kinateder, Clustering functional data, Journal of Classification, 20 (2003), 93-114.  doi: 10.1007/s00357-003-0007-3.  Google Scholar [22] D. Wei, A constant-factor bi-criteria approximation guarantee for $k$-means++, Proceedings of the Thirtieth International Conference on Neural Information Processing Systems, (2016), 604–612. Google Scholar [23] X. Wu, V. Kumar, J. Quinlan, J. Ross Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P.S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand and D. Steinberg, Top 10 algorithms in data mining, Knowledge and Information Systems, 14 (2008), 1-37.  doi: 10.1007/s10115-007-0114-2.  Google Scholar

show all references

##### References:
 [1] C. Abraham, P. A. Cornillon, E. Matzner-Løber and N. Molinari, Unsupervised curve clustering using B-splines, Scandinavian Journal of Statistics, 30 (2003), 581-595.  doi: 10.1111/1467-9469.00350.  Google Scholar [2] S. Ahmadian, A. Norouzi-Fard, O. Svensson and J. Ward, Better guarantees for $k$-means and Euclidean $k$-median by primal-dual algorithms, SIAM Journal on Computing, (2019), FOCS17-97–FOCS17-156. doi: 10.1137/18M1171321.  Google Scholar [3] D. Aloise, A. Deshpande, P. Hansen and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, 75 (2009), 245-248.  doi: 10.1007/s10994-009-5103-0.  Google Scholar [4] D. Arthur and S. Vassilvitskii, $K$-means++: The advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, (2007), 1027–1035.  Google Scholar [5] M. Boullé, Functional data clustering via piecewise constant nonparametric density estimation, Pattern Recognition, 45 (2012), 4389-4401.   Google Scholar [6] C. Bouveyron and C. Brunet-Saumard, Model-based clustering of high-dimensional data: A review, Computational Statistics & Data Analysis, 71 (2014), 52-78.  doi: 10.1016/j.csda.2012.12.008.  Google Scholar [7] R. Gamasaee and M. Zarandi, A new dirichlet process for mining dynamic patterns in functional data, Information Sciences, 405 (2017), 55-80.  doi: 10.1016/j.ins.2017.04.008.  Google Scholar [8] S. Har-Peled and B. Sadri, How fast is the $k$-means method?, Algorithmica, 41 (2005), 185-202.  doi: 10.1007/s00453-004-1127-9.  Google Scholar [9] J. Jacques and C. Preda, Functional data clustering: A survey, Advances in Data Analysis and Classification, 8 (2014), 231-255.  doi: 10.1007/s11634-013-0158-y.  Google Scholar [10] S. Ji, D. Xu, L. Guo, M. Li and D. Zhang, The seeding algorithm for spherical $k$-means clustering with penalties, Journal of Combinatorial Optimization, 2020. doi: 10.1007/s10878-020-00569-1.  Google Scholar [11] M. Kayano, K. Dozono and S. Konishi, Functional cluster analysis via orthonormalized Gaussian basis expansions and its application, Journal of Classification, 27 (2010), 211-230.  doi: 10.1007/s00357-010-9054-8.  Google Scholar [12] M. Li, The bi-criteria seeding algorithms for two variants of $k$-means problem, Journal of Combinatorial Optimization, 2020. doi: 10.1007/s10878-020-00537-9.  Google Scholar [13] M. Li, D. Xu, J. Yue, D. Zhang and P. Zhang, The seeding algorithm for $k$-means problem with penalties, Journal of Combinatorial Optimization, 39 (2020), 15-32.  doi: 10.1007/s10878-019-00450-w.  Google Scholar [14] S. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, 28 (1982), 129-137.  doi: 10.1109/TIT.1982.1056489.  Google Scholar [15] Y. Meng, J. Liang, F. Cao and Y. He, A new distance with derivative information for functional $k$-means clustering algorithm, Information Sciences, 463/464 (2018), 166-185.  doi: 10.1016/j.ins.2018.06.035.  Google Scholar [16] R. Ostrovsky, Y. Rabani, L. Schulman and C. Swamy, The effectiveness of Lloyd-type methods for the $k$-means problem, Journal of the ACM, 59 (2012), 28: 1–28: 22. doi: 10.1145/2395116.2395117.  Google Scholar [17] G. Ozturk and M. Ciftci, Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial & Management Optimization, 11 (2015), 921-932.  doi: 10.3934/jimo.2015.11.921.  Google Scholar [18] J. Park and J. Ahn, Clustering multivariate functional data with phase variation, Biometrics, 73 (2017), 324-333.  doi: 10.1111/biom.12546.  Google Scholar [19] J. Peng and H. G. Müller, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, The Annals of Applied Statistics, 2 (2008), 1056-1077.  doi: 10.1214/08-AOAS172.  Google Scholar [20] C. Preda, G. Saporta and C. Lévéder, PLS classification of functional data, Computational Statistics, 22 (2007), 223-235.  doi: 10.1007/s00180-007-0041-4.  Google Scholar [21] T. Tarpey and K. K. Kinateder, Clustering functional data, Journal of Classification, 20 (2003), 93-114.  doi: 10.1007/s00357-003-0007-3.  Google Scholar [22] D. Wei, A constant-factor bi-criteria approximation guarantee for $k$-means++, Proceedings of the Thirtieth International Conference on Neural Information Processing Systems, (2016), 604–612. Google Scholar [23] X. Wu, V. Kumar, J. Quinlan, J. Ross Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P.S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand and D. Steinberg, Top 10 algorithms in data mining, Knowledge and Information Systems, 14 (2008), 1-37.  doi: 10.1007/s10115-007-0114-2.  Google Scholar
Notation Index
 Notations Meaning of symbols $x_i(t)$ or $x_i^j(t)$ functional curve $X(t)$ or $X^i(t)$ or $C^i(t)$ $d$-dimensional functional sample with functional curves as its components $\mathfrak{F}^d(t)$ set of $d$-dimensional functional samples $\Gamma(t)$ or $\Delta(t)$ or $C(t)$ subset of $\mathfrak{F}^d(t)$ $\mu(\Gamma(t))$ center of mass of functional samples of $\Gamma(t)$ $d(X^i(t), X^j(t))$ distance between the functional samples $X^i(t)$ and $X^j(t)$ $d(X^i(t), \Gamma(t))$ distance from the functional sample $X^i(t)$ to the subset of functional samples $\Gamma(t)$ $X(t)_{\Gamma(t)}$ the closest functional sample in $\Gamma(t)$ to the functional sample $X(t)$ $\Phi(\Gamma(t), C(t))$ potential function of $\Gamma(t)$ over $C(t)$ $\Gamma_{C(t)}^i(t)$ the $i$-th cluster of $\Gamma(t)$ with respect to $C(t)$
 Notations Meaning of symbols $x_i(t)$ or $x_i^j(t)$ functional curve $X(t)$ or $X^i(t)$ or $C^i(t)$ $d$-dimensional functional sample with functional curves as its components $\mathfrak{F}^d(t)$ set of $d$-dimensional functional samples $\Gamma(t)$ or $\Delta(t)$ or $C(t)$ subset of $\mathfrak{F}^d(t)$ $\mu(\Gamma(t))$ center of mass of functional samples of $\Gamma(t)$ $d(X^i(t), X^j(t))$ distance between the functional samples $X^i(t)$ and $X^j(t)$ $d(X^i(t), \Gamma(t))$ distance from the functional sample $X^i(t)$ to the subset of functional samples $\Gamma(t)$ $X(t)_{\Gamma(t)}$ the closest functional sample in $\Gamma(t)$ to the functional sample $X(t)$ $\Phi(\Gamma(t), C(t))$ potential function of $\Gamma(t)$ over $C(t)$ $\Gamma_{C(t)}^i(t)$ the $i$-th cluster of $\Gamma(t)$ with respect to $C(t)$
Comparison of Algorithm 1 and the functional $k$-means algorithm in [15]
 Data Set Method ARI DBI Initial Cost Returned Cost Time (s) Simudata SeedAlg 0.7919 0.8652 3338 1689 58 FuncAlg 0.7975 0.8656 5465 1689 61 Sdata SeedAlg 0.6651 1.1385 1667 712 192 FuncAlg 0.6561 1.1384 2485 720 217
 Data Set Method ARI DBI Initial Cost Returned Cost Time (s) Simudata SeedAlg 0.7919 0.8652 3338 1689 58 FuncAlg 0.7975 0.8656 5465 1689 61 Sdata SeedAlg 0.6651 1.1385 1667 712 192 FuncAlg 0.6561 1.1384 2485 720 217
 [1] Manuel Friedrich, Martin Kružík, Jan Valdman. Numerical approximation of von Kármán viscoelastic plates. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 299-319. doi: 10.3934/dcdss.2020322 [2] Wolfgang Riedl, Robert Baier, Matthias Gerdts. Optimization-based subdivision algorithm for reachable sets. Journal of Computational Dynamics, 2021, 8 (1) : 99-130. doi: 10.3934/jcd.2021005 [3] Mohammed Abdulrazaq Kahya, Suhaib Abduljabbar Altamir, Zakariya Yahya Algamal. Improving whale optimization algorithm for feature selection with a time-varying transfer function. Numerical Algebra, Control & Optimization, 2021, 11 (1) : 87-98. doi: 10.3934/naco.2020017 [4] Cheng Peng, Zhaohui Tang, Weihua Gui, Qing Chen, Jing He. A bidirectional weighted boundary distance algorithm for time series similarity computation based on optimized sliding window size. Journal of Industrial & Management Optimization, 2021, 17 (1) : 205-220. doi: 10.3934/jimo.2019107 [5] Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060 [6] Editorial Office. Retraction: Xiaohong Zhu, Zili Yang and Tabharit Zoubir, Research on the matching algorithm for heterologous image after deformation in the same scene. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1281-1281. doi: 10.3934/dcdss.2019088 [7] Zi Xu, Siwen Wang, Jinjin Huang. An efficient low complexity algorithm for box-constrained weighted maximin dispersion problem. Journal of Industrial & Management Optimization, 2021, 17 (2) : 971-979. doi: 10.3934/jimo.2020007 [8] Guo Zhou, Yongquan Zhou, Ruxin Zhao. Hybrid social spider optimization algorithm with differential mutation operator for the job-shop scheduling problem. Journal of Industrial & Management Optimization, 2021, 17 (2) : 533-548. doi: 10.3934/jimo.2019122 [9] Editorial Office. Retraction: Xiaohong Zhu, Lihe Zhou, Zili Yang and Joyati Debnath, A new text information extraction algorithm of video image under multimedia environment. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1265-1265. doi: 10.3934/dcdss.2019087 [10] Lan Luo, Zhe Zhang, Yong Yin. Simulated annealing and genetic algorithm based method for a bi-level seru loading problem with worker assignment in seru production systems. Journal of Industrial & Management Optimization, 2021, 17 (2) : 779-803. doi: 10.3934/jimo.2019134 [11] Marc Homs-Dones. A generalization of the Babbage functional equation. Discrete & Continuous Dynamical Systems - A, 2021, 41 (2) : 899-919. doi: 10.3934/dcds.2020303 [12] Bo Tan, Qinglong Zhou. Approximation properties of Lüroth expansions. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020389 [13] Sumit Arora, Manil T. Mohan, Jaydev Dabas. Approximate controllability of a Sobolev type impulsive functional evolution system in Banach spaces. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020049 [14] Shuang Liu, Yuan Lou. A functional approach towards eigenvalue problems associated with incompressible flow. Discrete & Continuous Dynamical Systems - A, 2020, 40 (6) : 3715-3736. doi: 10.3934/dcds.2020028 [15] Yifan Chen, Thomas Y. Hou. Function approximation via the subsampled Poincaré inequality. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 169-199. doi: 10.3934/dcds.2020296 [16] Mostafa Mbekhta. Representation and approximation of the polar factor of an operator on a Hilbert space. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020463 [17] Bilal Al Taki, Khawla Msheik, Jacques Sainte-Marie. On the rigid-lid approximation of shallow water Bingham. Discrete & Continuous Dynamical Systems - B, 2021, 26 (2) : 875-905. doi: 10.3934/dcdsb.2020146 [18] P. K. Jha, R. Lipton. Finite element approximation of nonlocal dynamic fracture models. Discrete & Continuous Dynamical Systems - B, 2021, 26 (3) : 1675-1710. doi: 10.3934/dcdsb.2020178 [19] Simone Fagioli, Emanuela Radici. Opinion formation systems via deterministic particles approximation. Kinetic & Related Models, 2021, 14 (1) : 45-76. doi: 10.3934/krm.2020048 [20] Bo Chen, Youde Wang. Global weak solutions for Landau-Lifshitz flows and heat flows associated to micromagnetic energy functional. Communications on Pure & Applied Analysis, 2021, 20 (1) : 319-338. doi: 10.3934/cpaa.2020268

2019 Impact Factor: 1.366