
-
Previous Article
HERMES: Persistent spectral graph software
- FoDS Home
- This Issue
-
Next Article
Markov chain simulation for multilevel Monte Carlo
A topological approach to spectral clustering
Centro de Investigación en Matemáticas, Calle Jalisco S/N, Colonia Valenciana, Guanajuato C.P 36023, GTO, Mexico |
We propose two related unsupervised clustering algorithms which, for input, take data assumed to be sampled from a uniform distribution supported on a metric space $ X $, and output a clustering of the data based on the selection of a topological model for the connected components of $ X $. Both algorithms work by selecting a graph on the samples from a natural one-parameter family of graphs, using a geometric criterion in the first case and an information theoretic criterion in the second. The estimated connected components of $ X $ are identified with the kernel of the associated graph Laplacian, which allows the algorithm to work without requiring the number of expected clusters or other auxiliary data as input.
References:
[1] |
M. Belkin and P. Niyogi,
Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15 (2003), 1373-1396.
doi: 10.1162/089976603321780317. |
[2] |
G. Carlsson,
Topology and data, Bull. Amer. Math. Soc. (N.S.), 46 (2009), 255-308.
doi: 10.1090/S0273-0979-09-01249-X. |
[3] |
F. Chazal, L. J. Guibas, S. Y. Oudot and P. Skraba, Persistence-based clustering in Riemannian manifolds, J. ACM, 60 (2013), 38pp.
doi: 10.1145/2535927. |
[4] |
F. R. K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, 92, American Mathematical Society, Providence, RI, 1997.
doi: 10.1090/cbms/092. |
[5] |
R. R. Coifman and S. Lafon,
Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), 5-30.
doi: 10.1016/j.acha.2006.04.006. |
[6] |
D. L. Donoho and C. Grimes,
Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data, Proc. Natl. Acad. Sci. USA, 100 (2003), 5591-5596.
doi: 10.1073/pnas.1031596100. |
[7] |
R. B. Ghrist,
Barcodes: The persistent topology of data, Bull. Amer. Math. Soc. (N.S.), 45 (2008), 61-75.
doi: 10.1090/S0273-0979-07-01191-3. |
[8] |
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer, New York, 2009
doi: 10.1007/978-0-387-84858-7. |
[9] |
J. Jost, Riemannian Geometry and Geometric Analysis, Universitext, Springer, Heidelberg, 2011.
doi: 10.1007/978-3-642-21298-7. |
[10] |
S. T. Roweis and L. K. Saul,
Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), 2323-2326.
doi: 10.1126/science.290.5500.2323. |
[11] |
R. Sánchez-Garcia, M. Fennelly, S. Norris, N. Wright, G. Niblo, J. Brodzki and J. Bialek,
Hierarchical spectral clustering of power grids, IEEE Transactions on Power Systems, 29 (2014), 2229-2237.
doi: 10.1109/TPWRS.2014.2306756. |
[12] |
W.-J. Shen, H.-S. Wong, Q.-W. Xiao, X. Guo and S. Smale,
Introduction to the peptide binding problem of computational immunology: New results, Found. Comput. Math., 14 (2014), 951-984.
doi: 10.1007/s10208-013-9173-9. |
[13] |
J. B. Tenenbaum, V. de Silva and J. C. Langford,
A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323.
doi: 10.1126/science.290.5500.2319. |
[14] |
A. Zomorodian and G. Carlsson,
Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249-274.
doi: 10.1007/s00454-004-1146-y. |
show all references
References:
[1] |
M. Belkin and P. Niyogi,
Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15 (2003), 1373-1396.
doi: 10.1162/089976603321780317. |
[2] |
G. Carlsson,
Topology and data, Bull. Amer. Math. Soc. (N.S.), 46 (2009), 255-308.
doi: 10.1090/S0273-0979-09-01249-X. |
[3] |
F. Chazal, L. J. Guibas, S. Y. Oudot and P. Skraba, Persistence-based clustering in Riemannian manifolds, J. ACM, 60 (2013), 38pp.
doi: 10.1145/2535927. |
[4] |
F. R. K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, 92, American Mathematical Society, Providence, RI, 1997.
doi: 10.1090/cbms/092. |
[5] |
R. R. Coifman and S. Lafon,
Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), 5-30.
doi: 10.1016/j.acha.2006.04.006. |
[6] |
D. L. Donoho and C. Grimes,
Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data, Proc. Natl. Acad. Sci. USA, 100 (2003), 5591-5596.
doi: 10.1073/pnas.1031596100. |
[7] |
R. B. Ghrist,
Barcodes: The persistent topology of data, Bull. Amer. Math. Soc. (N.S.), 45 (2008), 61-75.
doi: 10.1090/S0273-0979-07-01191-3. |
[8] |
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer, New York, 2009
doi: 10.1007/978-0-387-84858-7. |
[9] |
J. Jost, Riemannian Geometry and Geometric Analysis, Universitext, Springer, Heidelberg, 2011.
doi: 10.1007/978-3-642-21298-7. |
[10] |
S. T. Roweis and L. K. Saul,
Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), 2323-2326.
doi: 10.1126/science.290.5500.2323. |
[11] |
R. Sánchez-Garcia, M. Fennelly, S. Norris, N. Wright, G. Niblo, J. Brodzki and J. Bialek,
Hierarchical spectral clustering of power grids, IEEE Transactions on Power Systems, 29 (2014), 2229-2237.
doi: 10.1109/TPWRS.2014.2306756. |
[12] |
W.-J. Shen, H.-S. Wong, Q.-W. Xiao, X. Guo and S. Smale,
Introduction to the peptide binding problem of computational immunology: New results, Found. Comput. Math., 14 (2014), 951-984.
doi: 10.1007/s10208-013-9173-9. |
[13] |
J. B. Tenenbaum, V. de Silva and J. C. Langford,
A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323.
doi: 10.1126/science.290.5500.2319. |
[14] |
A. Zomorodian and G. Carlsson,
Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249-274.
doi: 10.1007/s00454-004-1146-y. |














Average Relative Entropy Method, 500 sample points | ||||||||||
Average Relative Entropy Method, 1000 sample | ||||||||||
SD|β0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ≥ 10 |
0.01 | 0 | 0 | 62.67 | 26 | 7.33 | 0.67 | 2 | 1.33 | 0 | 0 |
0.02 | 0 | 0 | 59.33 | 22 | 8 | 9.33 | 0.67 | 0.67 | 0 | 0 |
0.03 | 0 | 0 | 23.33 | 28 | 22 | 10.67 | 4.67 | 1.33 | 0 | 10 |
0.04 | 0 | 0 | 4 | 9.33 | 12 | 2 | 5.33 | 4 | 5.33 | 58.01 |
0.05 | 0 | 0 | 1.33 | 5.33 | 2.67 | 4.67 | 2 | 5.33 | 2.67 | 76 |
Average Relative Entropy Method, 500 sample points | ||||||||||
Average Relative Entropy Method, 1000 sample | ||||||||||
SD|β0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ≥ 10 |
0.01 | 0 | 0 | 62.67 | 26 | 7.33 | 0.67 | 2 | 1.33 | 0 | 0 |
0.02 | 0 | 0 | 59.33 | 22 | 8 | 9.33 | 0.67 | 0.67 | 0 | 0 |
0.03 | 0 | 0 | 23.33 | 28 | 22 | 10.67 | 4.67 | 1.33 | 0 | 10 |
0.04 | 0 | 0 | 4 | 9.33 | 12 | 2 | 5.33 | 4 | 5.33 | 58.01 |
0.05 | 0 | 0 | 1.33 | 5.33 | 2.67 | 4.67 | 2 | 5.33 | 2.67 | 76 |
Average Local Volume Ratio Method, 500 Sample Points | ||||||||||
Average Local Volume Ratio Method, 1000 Sample Points | ||||||||||
SD|β0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ≥ 10 |
0.01 | 0 | 0 | 98.67 | 1.33 | 0 | 0 | 0 | 0 | 0 | 0 |
0.02 | 0 | 0 | 89.33 | 8 | 1.33 | 1.33 | 0 | 0 | 0 | 0 |
0.03 | 0 | 0 | 45.33 | 28 | 16 | 6.67 | 2.67 | 1.33 | 0 | 0 |
0.04 | 1.33 | 0 | 13.33 | 21.33 | 26 | 11.33 | 12 | 6 | 3.33 | 0 |
0.05 | 75.33 | 11.33 | 2.67 | 4.67 | 1.33 | 2 | 0.67 | 1.33 | 0.67 | 5.38 |
Average Local Volume Ratio Method, 500 Sample Points | ||||||||||
Average Local Volume Ratio Method, 1000 Sample Points | ||||||||||
SD|β0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ≥ 10 |
0.01 | 0 | 0 | 98.67 | 1.33 | 0 | 0 | 0 | 0 | 0 | 0 |
0.02 | 0 | 0 | 89.33 | 8 | 1.33 | 1.33 | 0 | 0 | 0 | 0 |
0.03 | 0 | 0 | 45.33 | 28 | 16 | 6.67 | 2.67 | 1.33 | 0 | 0 |
0.04 | 1.33 | 0 | 13.33 | 21.33 | 26 | 11.33 | 12 | 6 | 3.33 | 0 |
0.05 | 75.33 | 11.33 | 2.67 | 4.67 | 1.33 | 2 | 0.67 | 1.33 | 0.67 | 5.38 |
[1] |
Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial and Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565 |
[2] |
Daniel Mckenzie, Steven Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 2019, 1 (3) : 307-327. doi: 10.3934/fods.2019014 |
[3] |
Michael Herty, Lorenzo Pareschi, Giuseppe Visconti. Mean field models for large data–clustering problems. Networks and Heterogeneous Media, 2020, 15 (3) : 463-487. doi: 10.3934/nhm.2020027 |
[4] |
Jia Chen, Ioannis D. Schizas. Multimodal correlations-based data clustering. Foundations of Data Science, 2022 doi: 10.3934/fods.2022011 |
[5] |
Daniel Roggen, Martin Wirz, Gerhard Tröster, Dirk Helbing. Recognition of crowd behavior from mobile sensors with pattern analysis and graph clustering methods. Networks and Heterogeneous Media, 2011, 6 (3) : 521-544. doi: 10.3934/nhm.2011.6.521 |
[6] |
Adela DePavia, Stefan Steinerberger. Spectral clustering revisited: Information hidden in the Fiedler vector. Foundations of Data Science, 2021, 3 (2) : 225-249. doi: 10.3934/fods.2021015 |
[7] |
George Siopsis. Quantum topological data analysis with continuous variables. Foundations of Data Science, 2019, 1 (4) : 419-431. doi: 10.3934/fods.2019017 |
[8] |
Tyrus Berry, Timothy Sauer. Consistent manifold representation for topological data analysis. Foundations of Data Science, 2019, 1 (1) : 1-38. doi: 10.3934/fods.2019001 |
[9] |
Erik Carlsson, John Gunnar Carlsson, Shannon Sweitzer. Applying topological data analysis to local search problems. Foundations of Data Science, 2022 doi: 10.3934/fods.2022006 |
[10] |
Matthew O. Williams, Clarence W. Rowley, Ioannis G. Kevrekidis. A kernel-based method for data-driven koopman spectral analysis. Journal of Computational Dynamics, 2015, 2 (2) : 247-265. doi: 10.3934/jcd.2015005 |
[11] |
Massimiliano Guzzo, Giancarlo Benettin. A spectral formulation of the Nekhoroshev theorem and its relevance for numerical and experimental data analysis. Discrete and Continuous Dynamical Systems - B, 2001, 1 (1) : 1-28. doi: 10.3934/dcdsb.2001.1.1 |
[12] |
Jiangchuan Fan, Xinyu Guo, Jianjun Du, Weiliang Wen, Xianju Lu, Brahmani Louiza. Analysis of the clustering fusion algorithm for multi-band color image. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1233-1249. doi: 10.3934/dcdss.2019085 |
[13] |
Yongbin Ou, Cun-Quan Zhang. A new multimembership clustering method. Journal of Industrial and Management Optimization, 2007, 3 (4) : 619-624. doi: 10.3934/jimo.2007.3.619 |
[14] |
Jelena Grbić, Jie Wu, Kelin Xia, Guo-Wei Wei. Aspects of topological approaches for data science. Foundations of Data Science, 2022, 4 (2) : 165-216. doi: 10.3934/fods.2022002 |
[15] |
Liu Hui, Lin Zhi, Waqas Ahmad. Network(graph) data research in the coordinate system. Mathematical Foundations of Computing, 2018, 1 (1) : 1-10. doi: 10.3934/mfc.2018001 |
[16] |
Lu Xian, Henry Adams, Chad M. Topaz, Lori Ziegelmeier. Capturing dynamics of time-varying data via topology. Foundations of Data Science, 2022, 4 (1) : 1-36. doi: 10.3934/fods.2021033 |
[17] |
Ana Jofre, Lan-Xi Dong, Ha Phuong Vu, Steve Szigeti, Sara Diamond. Rendering website traffic data into interactive taste graph visualizations. Big Data & Information Analytics, 2017, 2 (2) : 107-118. doi: 10.3934/bdia.2017003 |
[18] |
Sarah Constantin, Robert S. Strichartz, Miles Wheeler. Analysis of the Laplacian and spectral operators on the Vicsek set. Communications on Pure and Applied Analysis, 2011, 10 (1) : 1-44. doi: 10.3934/cpaa.2011.10.1 |
[19] |
Raluca Felea, Romina Gaburro, Allan Greenleaf, Clifford Nolan. Microlocal analysis of borehole seismic data. Inverse Problems and Imaging, , () : -. doi: 10.3934/ipi.2022026 |
[20] |
Elissar Nasreddine. Two-dimensional individual clustering model. Discrete and Continuous Dynamical Systems - S, 2014, 7 (2) : 307-316. doi: 10.3934/dcdss.2014.7.307 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]