\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

A topological approach to spectral clustering

  • * Corresponding author: Antonio Rieser

    * Corresponding author: Antonio Rieser

This research was supported in part by the TOPOSYS project FP7-ICT-318493-STREP, Cátedras CONACYT 1076, and Grant #N62909-19-1-2134 from the US Office of Naval Research Global and the Southern Office of Aerospace Research and Development of the US Air Force Office of Scientific Research

Abstract Full Text(HTML) Figure(14) / Table(2) Related Papers Cited by
  • We propose two related unsupervised clustering algorithms which, for input, take data assumed to be sampled from a uniform distribution supported on a metric space $ X $, and output a clustering of the data based on the selection of a topological model for the connected components of $ X $. Both algorithms work by selecting a graph on the samples from a natural one-parameter family of graphs, using a geometric criterion in the first case and an information theoretic criterion in the second. The estimated connected components of $ X $ are identified with the kernel of the associated graph Laplacian, which allows the algorithm to work without requiring the number of expected clusters or other auxiliary data as input.

    Mathematics Subject Classification: 62H30, 55N31.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 6.1.  An accurate classification of $ 500 $ points using the Average Local Volume Ratio method, illustrated by color. The classification produced by the Average Relative Entropy method was identical in this experiment

    Figure 6.2.  Average Relative Entropy vs. Scale, for the experiment in Figure 6.1. Note that local maxima appear where the smaller circles join to a larger cluster

    Figure 6.3.  Average Local Volume Ratio vs. Scale for the experiment in Figure 6.1. Note that local minima appear where the smaller circles join with the larger circle

    Figure A.1.  Example classification of 500 points using the Average Local Volume Ratio method, illustrated by color, for an experiment where the algorithm reported 4 clusters. One can see that the vast majority of points are correctly classified, although a small group of points in the circle on the left are assigned to their own cluster

    Figure A.2.  The classification of 500 points using the Average Local Volume Ratio method, illustrated by color, for an experiment where the algorithm reported 5 clusters. One can see that the majority of points are correctly classified, even though the horizontal circle is split into three groups

    Figure A.3.  Example classification of 1000 points using the Average Local Volume Ratio method, illustrated by color, for an experiment where the algorithm reported 4 clusters. One can see that the vast majority of points are correctly classified, although one outlier is assigned to its own cluster

    Figure A.4.  Example classification of 1000 points using the Average Local Volume Ratio method, illustrated by color, for an experiment where the algorithm reported 5 clusters. One can see that the vast majority of points are correctly classified, and there are two outliers visible in different shades of light blue, one partially obscured by the circle on the left

    Figure A.5.  Example classification of 1000 points using the Average Local Volume Ratio method, illustrated by color, for an experiment where the algorithm reported 6 clusters. As in Figures A.3 and A.4, the vast majority of points were classified correctly, and the additional clusters are from (partially obstructed) outliers

    Figure A.6.  The classification of 500 points using the Average Relative Entropy method, illustrated by color, for an experiment where the algorithm reported 4 clusters. One can see that the vast majority of points are correctly classified, although there are three outliers at least partially visible: the darker green point on the left circle, a light violet point in the center circle, and a dark violet point which is mostly obscured by the center cicle

    Figure A.7.  The classification of 500 points using the Average Relative Entropy method, illustrated by color, for an experiment where the algorithm reported 5 clusters. The majority of points are correctly classified, although there are two extra groups assigned to the slightly separated groups of points in the circle on the left

    Figure A.8.  The classification of 500 points using the Average Relative Entropy method, illustrated by color, for an experiment where the algorithm reported 6 clusters. As in Figure A.6 and A.6, the majority of points were classified correctly, although the circle on the right is split into top and bottom parts, there is an outlier (in blue) on the horizontal circle, as well a small group of separated points (in indigo) on the horzontal circle

    Figure A.9.  Example classification of 1000 points using the Average Relative Entropy method, illustrated by color, for an experiment where the algorithm reported 4 clusters. One can see that the vast majority of points are correctly classified, although there is an outlier on the right circle given its own cluster

    Figure A.10.  Example classification of 1000 points using the Average Relative Entropy Method, illustrated by color, for an experiment where the algorithm reported 5 clusters. One can see that the vast majority of points are correctly classified, although the center circle is split into 2 subgroups, and there is an outlier which is given its own cluster

    Figure A.11.  Example classification of 1000 points using the Average Relative Entropy method, illustrated by color, for an experiment where the algorithm reported 6 clusters. One can see that the majority of points were classified correctly, although the left circle is split into two subgroups, and there are two outliers given their own cluster

    Table 1.  Results for the Average Relative Entropy method, using $ 500 $ and $ 1000 $ sample points. This table shows the percentages, rounded to the nearest hundredth, of the $ 150 $ trials using the standard deviation in the left-hand column in which the ARE algorithm returned a given number $ \beta_0 $ of clusters

    Average Relative Entropy Method, 500 sample points
    $ SD\mid\beta_0 $ $ 1 $ $ 2 $ $ 3 $ $ 4 $ $ 5 $ $ 6 $ $ 7 $ $ 8 $ $ 9 $ $ \geq 10 $
    $ 0.01 $ $ 0 $ $ 0 $ $ 64 $ $ 18.67 $ $ 10.67 $ $ 5.33 $ $ 0.67 $ $ 0.67 $ $ 0 $ $ 0 $
    $ 0.02 $ $ 0 $ $ 0 $ $ 51.33 $ $ 27.33 $ $ 12.67 $ $ 4 $ $ 2 $ $ 2 $ $ 0.67 $ $ 0 $
    $ 0.03 $ $ 0 $ $ 0 $ $ 54.67 $ $ 26.67 $ $ 10.67 $ $ 4 $ $ 3.33 $ $ 0.67 $ $ 0 $ $ 0 $
    $ 0.04 $ $ 0.67 $ $ 2 $ $ 44 $ $ 23.33 $ $ 13.33 $ $ 5.33 $ $ 3.33 $ $ 2.67 $ $ 2 $ $ 3.33 $
    $ 0.05 $ $ 6 $ $ 26 $ $ 18 $ $ 20 $ $ 8.67 $ $ 4.67 $ $ 4.67 $ $ 4.67 $ $ 2 $ $ 5.33 $
    Average Relative Entropy Method, 1000 sample
    SD|β0 1 2 3 4 5 6 7 8 9 ≥ 10
    0.01 0 0 62.67 26 7.33 0.67 2 1.33 0 0
    0.02 0 0 59.33 22 8 9.33 0.67 0.67 0 0
    0.03 0 0 23.33 28 22 10.67 4.67 1.33 0 10
    0.04 0 0 4 9.33 12 2 5.33 4 5.33 58.01
    0.05 0 0 1.33 5.33 2.67 4.67 2 5.33 2.67 76
     | Show Table
    DownLoad: CSV

    Table 2.  Results for the Average Local Volume Ratio Method, using $ 500 $ and $ 1000 $ sample points.This table shows the percentages, rounded to the nearest hundredth, of the $ 150 $ trials using the standard deviation in the left-hand column in which the ALVR algorithm returned a given number $ \beta_0 $ of clusters

    Average Local Volume Ratio Method, 500 Sample Points
    $ SD\mid\beta_0 $ $ 1 $ $ 2 $ $ 3 $ $ 4 $ $ 5 $ $ 6 $ $ 7 $ $ 8 $ $ 9 $ $ \geq 10 $
    $ 0.01 $ $ 0 $ $ 0 $ $ 96 $ $ 3.33 $ $ 0.67 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $
    $ 0.02 $ $ 18 $ $ 0 $ $ 76 $ $ 6 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $
    $ 0.03 $ $ 65.33 $ $ 0 $ $ 32 $ $ 2.67 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $
    $ 0.04 $ $ 95.33 $ $ 1.33 $ $ 2.67 $ $ 0.67 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $
    $ 0.05 $ $ 91.33 $ $ 7.33 $ $ 1.33 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $ $ 0 $
    Average Local Volume Ratio Method, 1000 Sample Points
    SD|β0 1 2 3 4 5 6 7 8 9 ≥ 10
    0.01 0 0 98.67 1.33 0 0 0 0 0 0
    0.02 0 0 89.33 8 1.33 1.33 0 0 0 0
    0.03 0 0 45.33 28 16 6.67 2.67 1.33 0 0
    0.04 1.33 0 13.33 21.33 26 11.33 12 6 3.33 0
    0.05 75.33 11.33 2.67 4.67 1.33 2 0.67 1.33 0.67 5.38
     | Show Table
    DownLoad: CSV
  • [1] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15 (2003), 1373-1396.  doi: 10.1162/089976603321780317.
    [2] G. Carlsson, Topology and data, Bull. Amer. Math. Soc. (N.S.), 46 (2009), 255-308.  doi: 10.1090/S0273-0979-09-01249-X.
    [3] F. Chazal, L. J. Guibas, S. Y. Oudot and P. Skraba, Persistence-based clustering in Riemannian manifolds, J. ACM, 60 (2013), 38pp. doi: 10.1145/2535927.
    [4] F. R. K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, 92, American Mathematical Society, Providence, RI, 1997. doi: 10.1090/cbms/092.
    [5] R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), 5-30.  doi: 10.1016/j.acha.2006.04.006.
    [6] D. L. Donoho and C. Grimes, Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data, Proc. Natl. Acad. Sci. USA, 100 (2003), 5591-5596.  doi: 10.1073/pnas.1031596100.
    [7] R. B. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc. (N.S.), 45 (2008), 61-75.  doi: 10.1090/S0273-0979-07-01191-3.
    [8] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, Springer Series in Statistics, Springer, New York, 2009 doi: 10.1007/978-0-387-84858-7.
    [9] J. Jost, Riemannian Geometry and Geometric Analysis, Universitext, Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-21298-7.
    [10] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), 2323-2326.  doi: 10.1126/science.290.5500.2323.
    [11] R. Sánchez-GarciaM. FennellyS. NorrisN. WrightG. NibloJ. Brodzki and J. Bialek, Hierarchical spectral clustering of power grids, IEEE Transactions on Power Systems, 29 (2014), 2229-2237.  doi: 10.1109/TPWRS.2014.2306756.
    [12] W.-J. ShenH.-S. WongQ.-W. XiaoX. Guo and S. Smale, Introduction to the peptide binding problem of computational immunology: New results, Found. Comput. Math., 14 (2014), 951-984.  doi: 10.1007/s10208-013-9173-9.
    [13] J. B. TenenbaumV. de Silva and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323.  doi: 10.1126/science.290.5500.2319.
    [14] A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249-274.  doi: 10.1007/s00454-004-1146-y.
  • 加载中

Figures(14)

Tables(2)

SHARE

Article Metrics

HTML views(1049) PDF downloads(269) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return