# American Institute of Mathematical Sciences

September  2019, 1(3): 271-291. doi: 10.3934/fods.2019012

## Learning by active nonlinear diffusion

 1 Department of Mathematics, Department of Applied Mathematics and Statistics, Mathematical Institute of Data Sciences, Institute of Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD 21218, USA 2 Department of Mathematics, Tufts University, Medford, MA 02155, USA

* Corresponding author: James M. Murphy

Published  August 2019

Fund Project: This research is supported by NSF-DMS-125012, NSF-DMS-1724979, NSF-DMS-1708602, NSF-ATD-1737984, AFOSR FA9550-17-1-0280, NSF-IIS-1546392, NSF-DMS 1912737, and NSF-DMS 1924513.

This article proposes an active learning method for high-dimensional data, based on intrinsic data geometries learned through diffusion processes on graphs. Diffusion distances are used to parametrize low-dimensional structures on the dataset, which allow for high-accuracy labelings with only a small number of carefully chosen training labels. The geometric structure of the data suggests regions that have homogeneous labels, as well as regions with high label complexity that should be queried for labels. The proposed method enjoys theoretical performance guarantees on a general geometric data model, in which clusters corresponding to semantically meaningful classes are permitted to have nonlinear geometries, high ambient dimensionality, and suffer from significant noise and outlier corruption. The proposed algorithm is implemented in a manner that is quasilinear in the number of unlabeled data points, and exhibits competitive empirical performance on synthetic datasets and real hyperspectral remote sensing images.

Citation: Mauro Maggioni, James M. Murphy. Learning by active nonlinear diffusion. Foundations of Data Science, 2019, 1 (3) : 271-291. doi: 10.3934/fods.2019012
##### References:
 [1] N. Acito, G. Corsini and M. Diani, An unsupervised algorithm for hyperspectral image segmentation based on the gaussian mixture model, in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 6 (2003), 3745–3747. doi: 10.1109/IGARSS.2003.1295256.  Google Scholar [2] A. Anis, A. Gadde and A. Ortega, Towards a sampling theorem for signals on arbitrary graphs, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2014, 3864–3868. doi: 10.1109/ICASSP.2014.6854325.  Google Scholar [3] A. Anis, A. Gadde and A. Ortega, Efficient sampling set selection for bandlimited graph signals using graph spectral proxies, IEEE Transactions on Signal Processing, 64 (2016), 3775-3789.  doi: 10.1109/TSP.2016.2546233.  Google Scholar [4] A. Anis, A. E. Gamal, S. Avestimehr and A. Ortega, Asymptotic justification of bandlimited interpolation of graph signals for semi-supervised learning, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 5461–5465. doi: 10.1109/ICASSP.2015.7179015.  Google Scholar [5] E. Arias-Castro, Clustering based on pairwise distances when the data is of mixed dimensions, IEEE Transactions on Information Theory, 57 (2011), 1692–1706. doi: 10.1109/TIT.2011.2104630.  Google Scholar [6] E. Arias-Castro, G. Lerman and T. Zhang, Spectral clustering based on local PCA, Journal of Machine Learning Research, 18 (2017), 1-57.   Google Scholar [7] F. Aurenhammer, Voronoi diagrams—a survey of a fundamental geometric data structure, ACM Computing Surveys (CSUR), 23 (1991), 345-405.  doi: 10.1145/116873.116880.  Google Scholar [8] M.-F. Balcan, A. Beygelzimer and J. Langford, Agnostic active learning, Journal of Computer and System Sciences, 75 (2009), 78-89.  doi: 10.1016/j.jcss.2008.07.003.  Google Scholar [9] M.-F. Balcan, A. Broder and T. Zhang, Margin based active learning, in International Conference on Computational Learning Theory, Springer, 4359 (2007), 35–50. doi: 10.1007/978-3-540-72927-3_5.  Google Scholar [10] A. Beygelzimer, S. Kakade and J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, 97–104. doi: 10.1145/1143844.1143857.  Google Scholar [11] N. Cahill, W. Czaja and D. Messinger, Schroedinger eigenmaps with nondiagonal potentials for spatial-spectral clustering of hyperspectral imagery, in Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XX, vol. 9088, International Society for Optics and Photonics, 2014, 908804. Google Scholar [12] G. Camps-Valls, T. Marsheva and D. Zhou, Semi-supervised graph-based hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, 45 (2007), 3044-3054.  doi: 10.1109/TGRS.2007.895416.  Google Scholar [13] C. Cariou and K. Chehdi, Unsupervised nearest neighbors clustering with application to hyperspectral images, IEEE Journal of Selected Topics in Signal Processing, 9 (2015), 1105-1116.  doi: 10.1109/JSTSP.2015.2413371.  Google Scholar [14] R. Castro and R. Nowak, Minimax bounds for active learning, IEEE Transactions on Information Theory, 54 (2008), 2339-2353.  doi: 10.1109/TIT.2008.920189.  Google Scholar [15] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification, vol. 1, Springer Science & Business Media, 2003. Google Scholar [16] O. Chapelle, B. Scholkopf and A. Zien, Semi-supervised Learning, MIT Press, 2006.   Google Scholar [17] S. Chen, R. Varma, A. Sandryhaila and J. Kovačević, Discrete signal processing on graphs: Sampling theory, IEEE Transactions on Signal Processing, 63 (2015), 6510-6523.  doi: 10.1109/TSP.2015.2469645.  Google Scholar [18] Y. Chen, Z. Lin, X. Zhao, G. Wang and Y. Gu, Deep learning-based classification of hyperspectral data, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7 (2014), 2094-2107.  doi: 10.1109/JSTARS.2014.2329330.  Google Scholar [19] Y. Chen, S. Ma, X. Chen and P. Ghamisi, Hyperspectral data clustering based on density analysis ensemble, Remote Sensing Letters, 8 (2017), 194-203.  doi: 10.1080/2150704X.2016.1249295.  Google Scholar [20] J. Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20 (1960), 37-46.  doi: 10.1177/001316446002000104.  Google Scholar [21] D. Cohn, L. Atlas and R. Ladner, Improving generalization with active learning, Machine Learning, 15 (1994), 201-221.  doi: 10.1007/BF00993277.  Google Scholar [22] R. Coifman and S. Lafon, Diffusion maps, Applied and Computational Harmonic Analysis, 21 (2006), 5-30.  doi: 10.1016/j.acha.2006.04.006.  Google Scholar [23] R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner and S. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Proceedings of the National Academy of Sciences of the United States of America, 102 (2005), 7426-7431.   Google Scholar [24] S. Dasgupta, Two faces of active learning, Theoretical Computer Science, 412 (2011), 1767-1781.  doi: 10.1016/j.tcs.2010.12.054.  Google Scholar [25] S. Dasgupta and D. Hsu, Hierarchical sampling for active learning, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008,208–215. doi: 10.1145/1390156.1390183.  Google Scholar [26] S. Dasgupta, D. Hsu and C. Monteleoni, A general agnostic active learning algorithm, in Advances in neural information processing systems, 2008,353–360. Google Scholar [27] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. Swetter, H. Blau and S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542 (2017), 115-118.  doi: 10.1038/nature21056.  Google Scholar [28] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in Statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar [29] N. Garcia Trillos, M. Gerlach, M. Hein and D. Slepcev, Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace–Beltrami operator, arXiv: 1801.10108. Google Scholar [30] N. Garcia Trillos, F. Hoffmann and B. Hosseini, Geometric structure of graph Laplacian embeddings, arXiv: 1901.10651. Google Scholar [31] M. Gavish and B. Nadler, Normalized cuts are approximately inverse exit times, SIAM Journal on Matrix Analysis and Applications, 34 (2013), 757-772.  doi: 10.1137/110826928.  Google Scholar [32] N. Gillis, D. Kuang and H. Park, Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization, IEEE Transactions on Geoscience and Remote Sensing, 53 (2015), 2066-2078.  doi: 10.1109/TGRS.2014.2352857.  Google Scholar [33] J. Ham, Y. Chen, M. Crawford and J. Ghosh, Investigation of the random forest framework for classification of hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing, 43 (2005), 492-501.  doi: 10.1109/TGRS.2004.842481.  Google Scholar [34] S. Hanneke, Rates of convergence in active learning, The Annals of Statistics, 39 (2011), 333-361.  doi: 10.1214/10-AOS843.  Google Scholar [35] A. Krizhevsky, I. Sutskever and G. Hinton, Imagenet classification with deep convolutional neural networks, Communications of the ACM, 60 (2017), 84-90.  doi: 10.1145/3065386.  Google Scholar [36] S. Lafon and A. Lee, Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (2006), 1393-1403.   Google Scholar [37] J. Li, J. Bioucas-Dias and A. Plaza, Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning, IEEE Transactions on Geoscience and Remote Sensing, 48 (2010), 4085-4098.  doi: 10.1109/TGRS.2010.2060550.  Google Scholar [38] J. Li, J. Bioucas-Dias and A. Plaza, Semisupervised hyperspectral image classification using soft sparse multinomial logistic regression, IEEE Geoscience and Remote Sensing Letters, 10 (2013), 318-322.   Google Scholar [39] A. Little, M. Maggioni and J. Murphy, Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms, arXiv: 1712.06206. Google Scholar [40] M. Maggioni and J. Murphy, Learning by unsupervised nonlinear diffusion, arXiv: 1810.06702. Google Scholar [41] F. Melgani and L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on geoscience and remote sensing, 42 (2004), 1778-1790.   Google Scholar [42] D. Mixon, S. Villar and R. Ward, Clustering subgaussian mixtures by semidefinite programming, Information and Inference: A Journal of the IMA, 6 (2017), 389-415.  doi: 10.1093/imaiai/iax001.  Google Scholar [43] J. Murphy and M. Maggioni, Iterative active learning with diffusion geometry for hyperspectral images, in 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), IEEE, 2018, 1–5. doi: 10.1109/WHISPERS.2018.8747033.  Google Scholar [44] J. Murphy and M. Maggioni, Spectral-spatial diffusion geometry for hyperspectral image clustering, arXiv: 1902.05402. Google Scholar [45] J. Murphy and M. Maggioni, Unsupervised clustering and active learning of hyperspectral images with nonlinear diffusion, IEEE Transactions on Geoscience and Remote Sensing, 57 (2019), 1829-1845.  doi: 10.1109/TGRS.2018.2869723.  Google Scholar [46] B. Nadler and M. Galun, Fundamental limitations of spectral clustering, in Advances in Neural Information Processing Systems, 2007, 1017–1024. Google Scholar [47] A. Paoli, F. Melgani and E. Pasolli, Clustering of hyperspectral images based on multiobjective particle swarm optimization, IEEE Transactions on Geoscience and Remote Sensing, 47 (2009), 4175-4188.  doi: 10.1109/TGRS.2009.2023666.  Google Scholar [48] I. Pesenson, Sampling in paley-wiener spaces on combinatorial graphs, Transactions of the American Mathematical Society, 360 (2008), 5603-5627.  doi: 10.1090/S0002-9947-08-04511-X.  Google Scholar [49] I. Pesenson and M. Pesenson, Sampling, filtering and sparse approximations on combinatorial graphs, Journal of Fourier Analysis and Applications, 16 (2010), 921-942.  doi: 10.1007/s00041-009-9116-7.  Google Scholar [50] G. Puy and P. Pérez, Structured sampling and fast reconstruction of smooth graph signals, Information and Inference: A Journal of the IMA, 7 (2018), 657-688.  doi: 10.1093/imaiai/iax021.  Google Scholar [51] G. Puy, N. Tremblay, R. Gribonval and P. Vandergheynst, Random sampling of bandlimited signals on graphs, Applied and Computational Harmonic Analysis, 44 (2018), 446-475.  doi: 10.1016/j.acha.2016.05.005.  Google Scholar [52] S. Rajan, J. Ghosh and M. Crawford, An active learning approach to hyperspectral data classification, IEEE Transactions on Geoscience and Remote Sensing, 46 (2008), 1231-1242.  doi: 10.1109/TGRS.2007.910220.  Google Scholar [53] F. Ratle, G. Camps-Valls and J. Weston, Semisupervised neural networks for efficient hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, 48 (2010), 2271-2282.  doi: 10.1109/TGRS.2009.2037898.  Google Scholar [54] G. Schiebinger, M. Wainwright and B. Yu, The geometry of kernelized spectral clustering, The Annals of Statistics, 43 (2015), 819-846.  doi: 10.1214/14-AOS1283.  Google Scholar [55] B. Settles, Active Learning Literature Survey, Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2009. Google Scholar [56] D. Shuman, S. Narang, P. Frossard, A. Ortega and P. Vandergheynst, The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains, IEEE Signal Processing Magazine, 3 (2013), 83-98.   Google Scholar [57] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam and M. Lanctot, Mastering the game of go with deep neural networks and tree search, nature, 529 (2016), 484-489.  doi: 10.1038/nature16961.  Google Scholar [58] S. Sun, P. Zhong, H. Xiao and R. Wang, Active learning with gaussian process classifier for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, 53 (2015), 1746-1760.  doi: 10.1109/TGRS.2014.2347343.  Google Scholar [59] M. Tanner and W. Wong, The calculation of posterior distributions by data augmentation, Journal of the American statistical Association, 82 (1987), 528-540.  doi: 10.1080/01621459.1987.10478458.  Google Scholar [60] D. Tuia, F. Ratle, F. Pacifici, M. Kanevski and W. Emery, Active learning methods for remote sensing image classification, IEEE Transactions on Geoscience and Remote Sensing, 47 (2009), 2218-2232.  doi: 10.1109/TGRS.2008.2010404.  Google Scholar [61] R. Urner, S. Wulff and S. Ben-David, PLALcluster-based active learning, in Conference on Learning Theory, 2013,376–397. Google Scholar [62] D. Van Dyk and X.-L. Meng, The art of data augmentation, Journal of Computational and Graphical Statistics, 10 (2001), 1-111.   Google Scholar [63] H. Zhai, H. Zhang, L. Zhang, P. Li and A. Plaza, A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery, IEEE Geoscience and Remote Sensing Letters, 14 (2017), 43-47.  doi: 10.1109/LGRS.2016.2625200.  Google Scholar [64] H. Zhang, H. Zhai and L. Z. P. Li, Spectral–spatial sparse subspace clustering for hyperspectral remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 54 (2016), 3672-3684.  doi: 10.1109/TGRS.2016.2524557.  Google Scholar [65] Z. Zhang, E. Pasolli, M. Crawford and J. Tilton, An active learning framework for hyperspectral image classification using hierarchical segmentation, IEEE J-STARS, 9 (2016), 640-654.  doi: 10.1109/JSTARS.2015.2493887.  Google Scholar [66] W. Zhu, V. Chayes, A. Tiard, S. Sanchez, D. Dahlberg, A. Bertozzi, S. Osher, D. Zosso and D. Kuang, Unsupervised classification in hyperspectral imagery with nonlocal total variation and primal-dual hybrid gradient algorithm, IEEE Transactions on Geoscience and Remote Sensing, 55 (2017), 2786-2798.  doi: 10.1109/TGRS.2017.2654486.  Google Scholar

show all references

##### References:
 [1] N. Acito, G. Corsini and M. Diani, An unsupervised algorithm for hyperspectral image segmentation based on the gaussian mixture model, in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 6 (2003), 3745–3747. doi: 10.1109/IGARSS.2003.1295256.  Google Scholar [2] A. Anis, A. Gadde and A. Ortega, Towards a sampling theorem for signals on arbitrary graphs, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2014, 3864–3868. doi: 10.1109/ICASSP.2014.6854325.  Google Scholar [3] A. Anis, A. Gadde and A. Ortega, Efficient sampling set selection for bandlimited graph signals using graph spectral proxies, IEEE Transactions on Signal Processing, 64 (2016), 3775-3789.  doi: 10.1109/TSP.2016.2546233.  Google Scholar [4] A. Anis, A. E. Gamal, S. Avestimehr and A. Ortega, Asymptotic justification of bandlimited interpolation of graph signals for semi-supervised learning, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, 5461–5465. doi: 10.1109/ICASSP.2015.7179015.  Google Scholar [5] E. Arias-Castro, Clustering based on pairwise distances when the data is of mixed dimensions, IEEE Transactions on Information Theory, 57 (2011), 1692–1706. doi: 10.1109/TIT.2011.2104630.  Google Scholar [6] E. Arias-Castro, G. Lerman and T. Zhang, Spectral clustering based on local PCA, Journal of Machine Learning Research, 18 (2017), 1-57.   Google Scholar [7] F. Aurenhammer, Voronoi diagrams—a survey of a fundamental geometric data structure, ACM Computing Surveys (CSUR), 23 (1991), 345-405.  doi: 10.1145/116873.116880.  Google Scholar [8] M.-F. Balcan, A. Beygelzimer and J. Langford, Agnostic active learning, Journal of Computer and System Sciences, 75 (2009), 78-89.  doi: 10.1016/j.jcss.2008.07.003.  Google Scholar [9] M.-F. Balcan, A. Broder and T. Zhang, Margin based active learning, in International Conference on Computational Learning Theory, Springer, 4359 (2007), 35–50. doi: 10.1007/978-3-540-72927-3_5.  Google Scholar [10] A. Beygelzimer, S. Kakade and J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, 97–104. doi: 10.1145/1143844.1143857.  Google Scholar [11] N. Cahill, W. Czaja and D. Messinger, Schroedinger eigenmaps with nondiagonal potentials for spatial-spectral clustering of hyperspectral imagery, in Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XX, vol. 9088, International Society for Optics and Photonics, 2014, 908804. Google Scholar [12] G. Camps-Valls, T. Marsheva and D. Zhou, Semi-supervised graph-based hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, 45 (2007), 3044-3054.  doi: 10.1109/TGRS.2007.895416.  Google Scholar [13] C. Cariou and K. Chehdi, Unsupervised nearest neighbors clustering with application to hyperspectral images, IEEE Journal of Selected Topics in Signal Processing, 9 (2015), 1105-1116.  doi: 10.1109/JSTSP.2015.2413371.  Google Scholar [14] R. Castro and R. Nowak, Minimax bounds for active learning, IEEE Transactions on Information Theory, 54 (2008), 2339-2353.  doi: 10.1109/TIT.2008.920189.  Google Scholar [15] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification, vol. 1, Springer Science & Business Media, 2003. Google Scholar [16] O. Chapelle, B. Scholkopf and A. Zien, Semi-supervised Learning, MIT Press, 2006.   Google Scholar [17] S. Chen, R. Varma, A. Sandryhaila and J. Kovačević, Discrete signal processing on graphs: Sampling theory, IEEE Transactions on Signal Processing, 63 (2015), 6510-6523.  doi: 10.1109/TSP.2015.2469645.  Google Scholar [18] Y. Chen, Z. Lin, X. Zhao, G. Wang and Y. Gu, Deep learning-based classification of hyperspectral data, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7 (2014), 2094-2107.  doi: 10.1109/JSTARS.2014.2329330.  Google Scholar [19] Y. Chen, S. Ma, X. Chen and P. Ghamisi, Hyperspectral data clustering based on density analysis ensemble, Remote Sensing Letters, 8 (2017), 194-203.  doi: 10.1080/2150704X.2016.1249295.  Google Scholar [20] J. Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20 (1960), 37-46.  doi: 10.1177/001316446002000104.  Google Scholar [21] D. Cohn, L. Atlas and R. Ladner, Improving generalization with active learning, Machine Learning, 15 (1994), 201-221.  doi: 10.1007/BF00993277.  Google Scholar [22] R. Coifman and S. Lafon, Diffusion maps, Applied and Computational Harmonic Analysis, 21 (2006), 5-30.  doi: 10.1016/j.acha.2006.04.006.  Google Scholar [23] R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner and S. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Proceedings of the National Academy of Sciences of the United States of America, 102 (2005), 7426-7431.   Google Scholar [24] S. Dasgupta, Two faces of active learning, Theoretical Computer Science, 412 (2011), 1767-1781.  doi: 10.1016/j.tcs.2010.12.054.  Google Scholar [25] S. Dasgupta and D. Hsu, Hierarchical sampling for active learning, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008,208–215. doi: 10.1145/1390156.1390183.  Google Scholar [26] S. Dasgupta, D. Hsu and C. Monteleoni, A general agnostic active learning algorithm, in Advances in neural information processing systems, 2008,353–360. Google Scholar [27] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. Swetter, H. Blau and S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542 (2017), 115-118.  doi: 10.1038/nature21056.  Google Scholar [28] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, vol. 1, Springer series in Statistics Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.  Google Scholar [29] N. Garcia Trillos, M. Gerlach, M. Hein and D. Slepcev, Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace–Beltrami operator, arXiv: 1801.10108. Google Scholar [30] N. Garcia Trillos, F. Hoffmann and B. Hosseini, Geometric structure of graph Laplacian embeddings, arXiv: 1901.10651. Google Scholar [31] M. Gavish and B. Nadler, Normalized cuts are approximately inverse exit times, SIAM Journal on Matrix Analysis and Applications, 34 (2013), 757-772.  doi: 10.1137/110826928.  Google Scholar [32] N. Gillis, D. Kuang and H. Park, Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization, IEEE Transactions on Geoscience and Remote Sensing, 53 (2015), 2066-2078.  doi: 10.1109/TGRS.2014.2352857.  Google Scholar [33] J. Ham, Y. Chen, M. Crawford and J. Ghosh, Investigation of the random forest framework for classification of hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing, 43 (2005), 492-501.  doi: 10.1109/TGRS.2004.842481.  Google Scholar [34] S. Hanneke, Rates of convergence in active learning, The Annals of Statistics, 39 (2011), 333-361.  doi: 10.1214/10-AOS843.  Google Scholar [35] A. Krizhevsky, I. Sutskever and G. Hinton, Imagenet classification with deep convolutional neural networks, Communications of the ACM, 60 (2017), 84-90.  doi: 10.1145/3065386.  Google Scholar [36] S. Lafon and A. Lee, Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (2006), 1393-1403.   Google Scholar [37] J. Li, J. Bioucas-Dias and A. Plaza, Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning, IEEE Transactions on Geoscience and Remote Sensing, 48 (2010), 4085-4098.  doi: 10.1109/TGRS.2010.2060550.  Google Scholar [38] J. Li, J. Bioucas-Dias and A. Plaza, Semisupervised hyperspectral image classification using soft sparse multinomial logistic regression, IEEE Geoscience and Remote Sensing Letters, 10 (2013), 318-322.   Google Scholar [39] A. Little, M. Maggioni and J. Murphy, Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms, arXiv: 1712.06206. Google Scholar [40] M. Maggioni and J. Murphy, Learning by unsupervised nonlinear diffusion, arXiv: 1810.06702. Google Scholar [41] F. Melgani and L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on geoscience and remote sensing, 42 (2004), 1778-1790.   Google Scholar [42] D. Mixon, S. Villar and R. Ward, Clustering subgaussian mixtures by semidefinite programming, Information and Inference: A Journal of the IMA, 6 (2017), 389-415.  doi: 10.1093/imaiai/iax001.  Google Scholar [43] J. Murphy and M. Maggioni, Iterative active learning with diffusion geometry for hyperspectral images, in 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), IEEE, 2018, 1–5. doi: 10.1109/WHISPERS.2018.8747033.  Google Scholar [44] J. Murphy and M. Maggioni, Spectral-spatial diffusion geometry for hyperspectral image clustering, arXiv: 1902.05402. Google Scholar [45] J. Murphy and M. Maggioni, Unsupervised clustering and active learning of hyperspectral images with nonlinear diffusion, IEEE Transactions on Geoscience and Remote Sensing, 57 (2019), 1829-1845.  doi: 10.1109/TGRS.2018.2869723.  Google Scholar [46] B. Nadler and M. Galun, Fundamental limitations of spectral clustering, in Advances in Neural Information Processing Systems, 2007, 1017–1024. Google Scholar [47] A. Paoli, F. Melgani and E. Pasolli, Clustering of hyperspectral images based on multiobjective particle swarm optimization, IEEE Transactions on Geoscience and Remote Sensing, 47 (2009), 4175-4188.  doi: 10.1109/TGRS.2009.2023666.  Google Scholar [48] I. Pesenson, Sampling in paley-wiener spaces on combinatorial graphs, Transactions of the American Mathematical Society, 360 (2008), 5603-5627.  doi: 10.1090/S0002-9947-08-04511-X.  Google Scholar [49] I. Pesenson and M. Pesenson, Sampling, filtering and sparse approximations on combinatorial graphs, Journal of Fourier Analysis and Applications, 16 (2010), 921-942.  doi: 10.1007/s00041-009-9116-7.  Google Scholar [50] G. Puy and P. Pérez, Structured sampling and fast reconstruction of smooth graph signals, Information and Inference: A Journal of the IMA, 7 (2018), 657-688.  doi: 10.1093/imaiai/iax021.  Google Scholar [51] G. Puy, N. Tremblay, R. Gribonval and P. Vandergheynst, Random sampling of bandlimited signals on graphs, Applied and Computational Harmonic Analysis, 44 (2018), 446-475.  doi: 10.1016/j.acha.2016.05.005.  Google Scholar [52] S. Rajan, J. Ghosh and M. Crawford, An active learning approach to hyperspectral data classification, IEEE Transactions on Geoscience and Remote Sensing, 46 (2008), 1231-1242.  doi: 10.1109/TGRS.2007.910220.  Google Scholar [53] F. Ratle, G. Camps-Valls and J. Weston, Semisupervised neural networks for efficient hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, 48 (2010), 2271-2282.  doi: 10.1109/TGRS.2009.2037898.  Google Scholar [54] G. Schiebinger, M. Wainwright and B. Yu, The geometry of kernelized spectral clustering, The Annals of Statistics, 43 (2015), 819-846.  doi: 10.1214/14-AOS1283.  Google Scholar [55] B. Settles, Active Learning Literature Survey, Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2009. Google Scholar [56] D. Shuman, S. Narang, P. Frossard, A. Ortega and P. Vandergheynst, The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains, IEEE Signal Processing Magazine, 3 (2013), 83-98.   Google Scholar [57] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam and M. Lanctot, Mastering the game of go with deep neural networks and tree search, nature, 529 (2016), 484-489.  doi: 10.1038/nature16961.  Google Scholar [58] S. Sun, P. Zhong, H. Xiao and R. Wang, Active learning with gaussian process classifier for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, 53 (2015), 1746-1760.  doi: 10.1109/TGRS.2014.2347343.  Google Scholar [59] M. Tanner and W. Wong, The calculation of posterior distributions by data augmentation, Journal of the American statistical Association, 82 (1987), 528-540.  doi: 10.1080/01621459.1987.10478458.  Google Scholar [60] D. Tuia, F. Ratle, F. Pacifici, M. Kanevski and W. Emery, Active learning methods for remote sensing image classification, IEEE Transactions on Geoscience and Remote Sensing, 47 (2009), 2218-2232.  doi: 10.1109/TGRS.2008.2010404.  Google Scholar [61] R. Urner, S. Wulff and S. Ben-David, PLALcluster-based active learning, in Conference on Learning Theory, 2013,376–397. Google Scholar [62] D. Van Dyk and X.-L. Meng, The art of data augmentation, Journal of Computational and Graphical Statistics, 10 (2001), 1-111.   Google Scholar [63] H. Zhai, H. Zhang, L. Zhang, P. Li and A. Plaza, A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery, IEEE Geoscience and Remote Sensing Letters, 14 (2017), 43-47.  doi: 10.1109/LGRS.2016.2625200.  Google Scholar [64] H. Zhang, H. Zhai and L. Z. P. Li, Spectral–spatial sparse subspace clustering for hyperspectral remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 54 (2016), 3672-3684.  doi: 10.1109/TGRS.2016.2524557.  Google Scholar [65] Z. Zhang, E. Pasolli, M. Crawford and J. Tilton, An active learning framework for hyperspectral image classification using hierarchical segmentation, IEEE J-STARS, 9 (2016), 640-654.  doi: 10.1109/JSTARS.2015.2493887.  Google Scholar [66] W. Zhu, V. Chayes, A. Tiard, S. Sanchez, D. Dahlberg, A. Bertozzi, S. Osher, D. Zosso and D. Kuang, Unsupervised classification in hyperspectral imagery with nonlocal total variation and primal-dual hybrid gradient algorithm, IEEE Transactions on Geoscience and Remote Sensing, 55 (2017), 2786-2798.  doi: 10.1109/TGRS.2017.2654486.  Google Scholar
Data colored by class label. Both the data in (a) and (b) exhibit cluster structure, which can guide active learning in the case that the labels are constant on these clusters. Indeed, on the left, using a simple clustering algorithm such as $K$-means suggests that only 3 labels are necessary to correctly label the entire dataset. For the data on the right, many more than three labels are necessary if $K$-means is used for the underlying clustering, since the clusters are highly elongated and nonlinear. Indeed, $K$-means will split the annular and elongated clusters. On the other hand, if pairwise comparisons are made with distances other than Euclidean distances, it may be possible that active learning achieves near perfect results with only 3 labels. The proposed active learning scheme gains robustness to class shape via diffusion geometry, and is suitable for data in both (a) and (b)
for $\log_{10}(t) = 2$ and $\log_{10}(t) = 9$, respectively. We see that for small values of $t$, the mode estimation incorrectly places the first three modes on the highly elongated cluster. For larger time values, the underlying random walk reaches a mesoscopic equilibrium and correct mode estimation is achieved. The emergence of mesoscopic equilibria is apparent in (c), (d), which show the matrix of diffusion distances at time $\log_{10}(t) = 2$ and $\log_{10}(t) = 9$, respectively. When $\log_{10}(t) = 2$, $P^{t}$ has not mixed, and there are still substantial within-cluster distances. For $\log_{10}(t) = 9$, $P^{t}$ has reached mesoscopic equilibria, so that within-cluster distances are quite small, yet between-cluster distances are still large [40]">Figure 2.  In (a) and (b), the values of $\log_{10}( \mathcal{D}_{t})$ are shown for synthetic geometric data from Figure 1 (b) for $\log_{10}(t) = 2$ and $\log_{10}(t) = 9$, respectively. We see that for small values of $t$, the mode estimation incorrectly places the first three modes on the highly elongated cluster. For larger time values, the underlying random walk reaches a mesoscopic equilibrium and correct mode estimation is achieved. The emergence of mesoscopic equilibria is apparent in (c), (d), which show the matrix of diffusion distances at time $\log_{10}(t) = 2$ and $\log_{10}(t) = 9$, respectively. When $\log_{10}(t) = 2$, $P^{t}$ has not mixed, and there are still substantial within-cluster distances. For $\log_{10}(t) = 9$, $P^{t}$ has reached mesoscopic equilibria, so that within-cluster distances are quite small, yet between-cluster distances are still large [40]
Top row: Three different synthetic datasets in two dimensions are shown, categorized as geometric, bottleneck and Gaussian. Bottom row: Plots of node purity for three different multiscale, hierarchical methods of clustering: average linkage clustering (ALC), single linkage clustering (SLC) and learning by unsupervised nonlinear diffusion (LUND). As the number of leaves/clusters increases, purity is non-decreasing. The purity of the LUND clusters converges more rapidly to the optimal value 1, indicating that high accuracy can be gained by correctly labeling a smaller number of clusters, compared to ALC and SLC
In (a), data with natural hierarchical structure is exhibited. The four Gaussians have means $(0, 0), (0, 2), \left(\frac{3}{2}, 0\right), \left(\frac{3}{2}, 2\right)$. While at one level of granularity there are 4 clusters (shown in (b)), at a coarser level of granularity the top 2 and bottom 2 Gaussians form clusters, leading to a clustering with only 2 clusters (shown in (c))
The matrix of pairwise diffusion distances for $\log_{10}(t) = 1.5$ and $\log_{10}(t) = 5$ are shown in (a) and (b), respectively, illustrating the hierarchical cluster structure in the data. This hierarchical structure introduces ambiguities into the estimation of the number of clusters $K$ in LUND, as shown (c). For small time, $\hat{K} = 4$, while for larger time $\hat{K} = 2$
LAND labelings of the data with four queries under two different scenarios: small diffusion time (top row) and large diffusion time (bottom row), and four latent clusters (first column) and two latent clusters (second column). The clusters are closer in the horizontal direction than in the vertical direction, from whence the hierarchical structure is derived. In all cases, LAND is able to to correctly label the data with just four queries, one for each Gaussian
. We see that LAND achieves rapid convergence to perfect labeling accuracy, compared to much slower convergence for the two comparison methods">Figure 7.  Experimental results on the synthetic datasets introduced in Figure 3. We see that LAND achieves rapid convergence to perfect labeling accuracy, compared to much slower convergence for the two comparison methods
The Salinas A dataset consists of $83 \times 86 = 7138$ pixels in $D = 224$ dimensions. The image has spatial resolution $3.7$m/pixel, and was recorded over Salinas, USA by the Aviris sensor. The six labelled classes are arranged in diagonal rows, and are quite spatially regular. The sum across all spectral bands is shown in (a), and the labeled ground truth is shown in (b), with pixels having the same class being given the same color
Pavia data consists of a $270\times 50 = 13500$ subset of the full Pavia data set. The image has spatial resolution $1.3$m/pixel, and was recorded over Pavia, Italy by the ROSIS sensor. It consists of 6 spatial classes, some of which are quite well-spread out in the image. The sum across all spectral bands is shown in (a), and the labeled ground truth is shown in (b), with pixels having the same class being given the same color
The active learning results for the Salinas A and Pavia datasets are shown in (a) and (b), respectively. In both cases, the LAND algorithm strongly outperforms the modified LAND variant using randomly selected training data, and the CBAL algorithm. In particular, LAND is able to achieve a significant improvement in accuracy with a very small number of labels
 [1] Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011 [2] Ana Rita Nogueira, João Gama, Carlos Abreu Ferreira. Causal discovery in machine learning: Theories and applications. Journal of Dynamics & Games, 2021, 8 (3) : 203-231. doi: 10.3934/jdg.2021008 [3] G. Calafiore, M.C. Campi. A learning theory approach to the construction of predictor models. Conference Publications, 2003, 2003 (Special) : 156-166. doi: 10.3934/proc.2003.2003.156 [4] Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031 [5] Mingbao Cheng, Shuxian Xiao, Guosheng Liu. Single-machine rescheduling problems with learning effect under disruptions. Journal of Industrial & Management Optimization, 2018, 14 (3) : 967-980. doi: 10.3934/jimo.2017085 [6] Andreas Chirstmann, Qiang Wu, Ding-Xuan Zhou. Preface to the special issue on analysis in machine learning and data science. Communications on Pure & Applied Analysis, 2020, 19 (8) : i-iii. doi: 10.3934/cpaa.2020171 [7] Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics & Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117 [8] Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems & Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901 [9] Christian Soize, Roger Ghanem. Probabilistic learning on manifolds. Foundations of Data Science, 2020, 2 (3) : 279-307. doi: 10.3934/fods.2020013 [10] Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete & Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17 [11] Gernot Holler, Karl Kunisch. Learning nonlocal regularization operators. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021003 [12] Ping Yan, Ji-Bo Wang, Li-Qiang Zhao. Single-machine bi-criterion scheduling with release times and exponentially time-dependent learning effects. Journal of Industrial & Management Optimization, 2019, 15 (3) : 1117-1131. doi: 10.3934/jimo.2018088 [13] Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu. Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 59-68. doi: 10.3934/bdia.2017008 [14] Xingong Zhang. Single machine and flowshop scheduling problems with sum-of-processing time based learning phenomenon. Journal of Industrial & Management Optimization, 2020, 16 (1) : 231-244. doi: 10.3934/jimo.2018148 [15] Kengo Nakai, Yoshitaka Saiki. Machine-learning construction of a model for a macroscopic fluid variable using the delay-coordinate of a scalar observable. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 1079-1092. doi: 10.3934/dcdss.2020352 [16] Marc Bocquet, Julien Brajard, Alberto Carrassi, Laurent Bertino. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Foundations of Data Science, 2020, 2 (1) : 55-80. doi: 10.3934/fods.2020004 [17] Minlong Lin, Ke Tang. Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data & Information Analytics, 2017, 2 (1) : 1-21. doi: 10.3934/bdia.2017005 [18] Ning Zhang, Qiang Wu. Online learning for supervised dimension reduction. Mathematical Foundations of Computing, 2019, 2 (2) : 95-106. doi: 10.3934/mfc.2019008 [19] D. Warren, K Najarian. Learning theory applied to Sigmoid network classification of protein biological function using primary protein structure. Conference Publications, 2003, 2003 (Special) : 898-904. doi: 10.3934/proc.2003.2003.898 [20] Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178

Impact Factor:

## Metrics

• PDF downloads (204)
• HTML views (987)
• Cited by (2)

## Other articlesby authors

• on AIMS
• on Google Scholar

[Back to Top]