# American Institute of Mathematical Sciences

doi: 10.3934/fods.2021012
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

## A density-based approach to feature detection in persistence diagrams for firn data

 1 Program of Informatics and Analytics, University of North Carolina at Greensboro, Greensboro, NC 27402, USA 2 Department of Mathematics, University of Maryland, College Park, MD 20742, USA 3 Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC 27402, USA 4 Department of Geological Sciences and Engineering, University of Nevada, Reno, Reno, NV 89557, USA 5 Department of Mathematics, William & Mary, Williamsburg, VA 23185, USA

* Corresponding author: Austin Lawson

Received  January 2021 Revised  April 2021 Early access May 2021

Fund Project: Hoffman and Chung were partially supported by NSF Grant DMS-1950549. Chung, Keegan, and Day were partially supported by the Army Research Office under Grant Number W911NF-20-1-0131. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein

Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of $k$-dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.

Citation: Austin Lawson, Tyler Hoffman, Yu-Min Chung, Kaitlin Keegan, Sarah Day. A density-based approach to feature detection in persistence diagrams for firn data. Foundations of Data Science, doi: 10.3934/fods.2021012
##### References:
 [1] R. J. Adler, O. Bobrowski, M. S. Borman, E. Subag and S. Weinberger, Persistent homology for random fields and complexes, in Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, Inst. Math. Stat. (IMS) Collect., 6, Inst. Math. Statist., Beachwood, OH, 2010,124–143. doi: 10.1214/10-IMSCOLL609. [2] N. Atienza, R. Gonzalez-Diaz and M. Rucco, Persistent entropy for separating topological features from noise in vietoris-rips complexes, J. Intelligent Information Systems, 52 (2019), 637-655.  doi: 10.1007/s10844-017-0473-4. [3] N. Atienza, R. Gonzalez-Diaz and M. Rucco, Separating topological noise from features using persistent entropy, in Software Technologies: Applications and Foundations, Lecture Notes in Computer Science, 9946, Springer, Cham, 2016, 3–12. doi: 10.1007/978-3-319-50230-4_1. [4] G. Carlsson, Topology and data, Bull. Amer. Math. Soc.(N.S.), 46 (2009), 255-308.  doi: 10.1090/S0273-0979-09-01249-X. [5] F. Chazal, V. de Silva and S. Oudot, Persistence stability for geometric complexes, Geom. Dedicata, 173 (2014), 193-214.  doi: 10.1007/s10711-013-9937-z. [6] F. Chazal and B. Michel, An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists, preprint, arXiv: 1710.04019. [7] Y.-M. Chung and S. Day, Topological fidelity and image thresholding: A persistent homology approach, J. Math. Imaging Vision, 60 (2018), 1167-1179.  doi: 10.1007/s10851-018-0802-4. [8] D. Cohen-Steiner, H. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), 103-120.  doi: 10.1007/s00454-006-1276-5. [9] P. Dlotko, Cubical complex, in GUDHI User and Reference Manual, GUDHI Editorial Board, 3.3.0 edition, 2020. Available from: https://gudhi.inria.fr/. [10] H. Edelsbrunner and J. L. Harer, Computational Topology. An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069. [11] M. Ester, H.-P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in KDD-96 Proceedings, 1996,226–231. [12] B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan and A. Singh, Confidence sets for persistence diagrams, Ann. Statist., 42 (2014), 2301-2339.  doi: 10.1214/14-AOS1252. [13] A. Garin and G. Tauzin, A topological "reading" lesson: Classification of MNIST using TDA, 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, 2019, 1551–1556. doi: 10.1109/ICMLA.2019.00256. [14] R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc., 45 (2008), 61-75.  doi: 10.1090/S0273-0979-07-01191-3. [15] J. F. Jardine, Data and homotopy types, preprint, arXiv: 1908.06323. [16] T. Kaczynski, K. Mischaikow and M. Mrozek, Computational Homology, Applied Mathematical Sciences, 157, Springer-Verlag, New York, 2004. doi: 10.1007/b97315. [17] K. Keegan, M. R. Albert and I. Baker, The impact of ice layers on gas transport through firn at the North Greenland Eemian Ice Drilling (NEEM) site, Greenland, The Cryosphere, 8 (2014), 1801-1806.  doi: 10.5194/tc-8-1801-2014. [18] A. Landais, J. M. Barnola, K. Kawamura, N. Caillon and M. Delmotte, Firn-air $\delta$15n in modern polar sites and glacial–interglacial ice: A model-data mismatch during glacial periods in Antarctica?, Quaternary Science Reviews, 25 (2006), 49-62.  doi: 10.1016/j.quascirev.2005.06.007. [19] M. Lesnick and M. Wright, Interactive visualization of 2-D persistence modules, preprint, arXiv: 1512.00180. [20] J. M. D. Lundin, C. M. Stevens, R. Arthern, C. Buizert and A. Orsi, Firn model intercomparison experiment (FirnMICE), J. Glaciology, 63 (2017), 401-422.  doi: 10.1017/jog.2016.114. [21] I. Obayashi, Y. Hiraoka and M. Kimura, Persistence diagrams with linear machine learning models, J. Appl. Comput. Topol., 1 (2018), 421-449.  doi: 10.1007/s41468-018-0013-5. [22] N. Otter, M. A. Porter, U. Tillmann, P. Grindrod and H. A. Harrington, A roadmap for the computation of persistent homology, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0109-5. [23] A. Patania, F. Vaccarino and G. Petri, Topological analysis of data, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0104-x. [24] J. Schwander and B. Stauffer, Age difference between polar ice and the air trapped in its bubbles, Nature, 311 (1984), 45-47.  doi: 10.1038/311045a0. [25] L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), 501-535.  doi: 10.1146/annurev-statistics-031017-100045. [26] A. Zomorodian, Topological data analysis, in Advances in Applied and Computational Topology, Proc. Sympos. Appl. Math., 70, Amer. Math. Soc., Providence, RI, 2012, 1–39. doi: 10.1090/psapm/070/587.

show all references

##### References:
 [1] R. J. Adler, O. Bobrowski, M. S. Borman, E. Subag and S. Weinberger, Persistent homology for random fields and complexes, in Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, Inst. Math. Stat. (IMS) Collect., 6, Inst. Math. Statist., Beachwood, OH, 2010,124–143. doi: 10.1214/10-IMSCOLL609. [2] N. Atienza, R. Gonzalez-Diaz and M. Rucco, Persistent entropy for separating topological features from noise in vietoris-rips complexes, J. Intelligent Information Systems, 52 (2019), 637-655.  doi: 10.1007/s10844-017-0473-4. [3] N. Atienza, R. Gonzalez-Diaz and M. Rucco, Separating topological noise from features using persistent entropy, in Software Technologies: Applications and Foundations, Lecture Notes in Computer Science, 9946, Springer, Cham, 2016, 3–12. doi: 10.1007/978-3-319-50230-4_1. [4] G. Carlsson, Topology and data, Bull. Amer. Math. Soc.(N.S.), 46 (2009), 255-308.  doi: 10.1090/S0273-0979-09-01249-X. [5] F. Chazal, V. de Silva and S. Oudot, Persistence stability for geometric complexes, Geom. Dedicata, 173 (2014), 193-214.  doi: 10.1007/s10711-013-9937-z. [6] F. Chazal and B. Michel, An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists, preprint, arXiv: 1710.04019. [7] Y.-M. Chung and S. Day, Topological fidelity and image thresholding: A persistent homology approach, J. Math. Imaging Vision, 60 (2018), 1167-1179.  doi: 10.1007/s10851-018-0802-4. [8] D. Cohen-Steiner, H. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), 103-120.  doi: 10.1007/s00454-006-1276-5. [9] P. Dlotko, Cubical complex, in GUDHI User and Reference Manual, GUDHI Editorial Board, 3.3.0 edition, 2020. Available from: https://gudhi.inria.fr/. [10] H. Edelsbrunner and J. L. Harer, Computational Topology. An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069. [11] M. Ester, H.-P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in KDD-96 Proceedings, 1996,226–231. [12] B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan and A. Singh, Confidence sets for persistence diagrams, Ann. Statist., 42 (2014), 2301-2339.  doi: 10.1214/14-AOS1252. [13] A. Garin and G. Tauzin, A topological "reading" lesson: Classification of MNIST using TDA, 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, 2019, 1551–1556. doi: 10.1109/ICMLA.2019.00256. [14] R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc., 45 (2008), 61-75.  doi: 10.1090/S0273-0979-07-01191-3. [15] J. F. Jardine, Data and homotopy types, preprint, arXiv: 1908.06323. [16] T. Kaczynski, K. Mischaikow and M. Mrozek, Computational Homology, Applied Mathematical Sciences, 157, Springer-Verlag, New York, 2004. doi: 10.1007/b97315. [17] K. Keegan, M. R. Albert and I. Baker, The impact of ice layers on gas transport through firn at the North Greenland Eemian Ice Drilling (NEEM) site, Greenland, The Cryosphere, 8 (2014), 1801-1806.  doi: 10.5194/tc-8-1801-2014. [18] A. Landais, J. M. Barnola, K. Kawamura, N. Caillon and M. Delmotte, Firn-air $\delta$15n in modern polar sites and glacial–interglacial ice: A model-data mismatch during glacial periods in Antarctica?, Quaternary Science Reviews, 25 (2006), 49-62.  doi: 10.1016/j.quascirev.2005.06.007. [19] M. Lesnick and M. Wright, Interactive visualization of 2-D persistence modules, preprint, arXiv: 1512.00180. [20] J. M. D. Lundin, C. M. Stevens, R. Arthern, C. Buizert and A. Orsi, Firn model intercomparison experiment (FirnMICE), J. Glaciology, 63 (2017), 401-422.  doi: 10.1017/jog.2016.114. [21] I. Obayashi, Y. Hiraoka and M. Kimura, Persistence diagrams with linear machine learning models, J. Appl. Comput. Topol., 1 (2018), 421-449.  doi: 10.1007/s41468-018-0013-5. [22] N. Otter, M. A. Porter, U. Tillmann, P. Grindrod and H. A. Harrington, A roadmap for the computation of persistent homology, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0109-5. [23] A. Patania, F. Vaccarino and G. Petri, Topological analysis of data, EPJ Data Science, 6 (2017). doi: 10.1140/epjds/s13688-017-0104-x. [24] J. Schwander and B. Stauffer, Age difference between polar ice and the air trapped in its bubbles, Nature, 311 (1984), 45-47.  doi: 10.1038/311045a0. [25] L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), 501-535.  doi: 10.1146/annurev-statistics-031017-100045. [26] A. Zomorodian, Topological data analysis, in Advances in Applied and Computational Topology, Proc. Sympos. Appl. Math., 70, Amer. Math. Soc., Providence, RI, 2012, 1–39. doi: 10.1090/psapm/070/587.
Samples of firn Micro-CT images at different depths. Lighter grey regions represent ice-space and darker regions represent pore-space in each image. In this work, there are 14 firn samples (not columns) in total, the depths of which range from 7 to 78 m in the firn column. Only select samples from that range of depths are shown here. Each sample consists of over 900 cross-sectional slices of 2D grayscale images of size $400\times400$ pixels. In other words, each sample is a 3D image of dimension $400\times 400\times 900$ voxels. We refer readers to our github page (https://github.com/azlawson/Firn) for a visualization of these 3D images. Only the 300th, 450th, and 600th slices from each sample's stack of slices are shown here. Counts of contiguous ice and pore-space regions, respectively, as determined by an expert in firn, are shown in parentheses
Motivational example of sublevel set filtration and the corresponding persistence diagram. (a) The original grayscale image $f$. (b)-(i) Sublevel set of $f$ at different values of threshold $t$. We color the sublevel set as white. (j) Labels for the rice grains. We used GUDHI[9] to calculate persistence and track generators. Red points are the generators corresponding to the purple points in (k), and the green one is the infinite generator in (k). (k) 0-dimensional persistence diagram, $D_0$. The purple points correspond to features with red labels in (j). (l) Illustration of the lifespan cutoff with selected points in purple. (m) Illustration of the PD Thresholding method with selected points in purple
Illustration of the DPFD method. The yellow ellipses represent clusters that are labelled to be noise, while the purple cluster is labelled as features since its minimum lifespan is above the designated cluster cutoff
Application of DPFD to the rice example shown in Figure 2. The parameters are $M = 20$, $q = 0.1$, and $L_0 = 50$. Purple (darker) points are detected by the DPFD, and the exact 79 features are detected
Application of lifespan cutoff to the rice example shown in Figure 2. Purple (darker) points are $D_0^{83}$. $\#D_0^{83} = 78$. Missing and mislabeled features are highlighted by green circles
Precision and Recall plots for various $(q,M)$ pairs
Rice grain count errors for various $(q,M)$ pairs
The process for the DPFD model. From left to right we have the original image, the image resulting from the application of median blur and the morphological opening, the persistence diagram of the preprocessed image, and finally the application of DPFD
A plot of the computed optimal $q$ and $M$ over depth
An example of DPFD on a firn image. The image is tagged based on the selected diagram points. From left to right are: the persistence diagram computed on the original image, the persistence diagram computed on the preprocessed image, and the original image tagged by points corresponding to the features identified by DPFD
An example of DPFD over-labelling ice on a firn image
Left: The average count of ice regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of ice regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. Right: The difference between the true number of ice regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models
Left: The average count of pore-space regions estimated by the DPFD model with a band of one standard deviation in green as a function of depth. Middle: The average count of pore-space regions estimated by the Lifespan model with a band of one standard deviation in green as a function of depth. These values represent the number of contiguous pore-space regions with depth. Right: The difference between the true number of pore-space regions in the test images, as defined by an expert, and the estimates given by the DPFD and Lifespan models
 [1] Antonio Rieser. A topological approach to spectral clustering. Foundations of Data Science, 2021, 3 (1) : 49-66. doi: 10.3934/fods.2021005 [2] Zhi Guo Feng, K. F. Cedric Yiu, K.L. Mak. Feature extraction of the patterned textile with deformations via optimal control theory. Discrete and Continuous Dynamical Systems - B, 2011, 16 (4) : 1055-1069. doi: 10.3934/dcdsb.2011.16.1055 [3] George Siopsis. Quantum topological data analysis with continuous variables. Foundations of Data Science, 2019, 1 (4) : 419-431. doi: 10.3934/fods.2019017 [4] Tyrus Berry, Timothy Sauer. Consistent manifold representation for topological data analysis. Foundations of Data Science, 2019, 1 (1) : 1-38. doi: 10.3934/fods.2019001 [5] Erik Carlsson, John Gunnar Carlsson, Shannon Sweitzer. Applying topological data analysis to local search problems. Foundations of Data Science, 2022  doi: 10.3934/fods.2022006 [6] Qiang Yin, Gongfa Li, Jianguo Zhu. Research on the method of step feature extraction for EOD robot based on 2D laser radar. Discrete and Continuous Dynamical Systems - S, 2015, 8 (6) : 1415-1421. doi: 10.3934/dcdss.2015.8.1415 [7] Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial and Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565 [8] Daniel Mckenzie, Steven Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 2019, 1 (3) : 307-327. doi: 10.3934/fods.2019014 [9] Michael Herty, Lorenzo Pareschi, Giuseppe Visconti. Mean field models for large data–clustering problems. Networks and Heterogeneous Media, 2020, 15 (3) : 463-487. doi: 10.3934/nhm.2020027 [10] Jia Chen, Ioannis D. Schizas. Multimodal correlations-based data clustering. Foundations of Data Science, 2022  doi: 10.3934/fods.2022011 [11] Danuta Gaweł, Krzysztof Fujarewicz. On the sensitivity of feature ranked lists for large-scale biological data. Mathematical Biosciences & Engineering, 2013, 10 (3) : 667-690. doi: 10.3934/mbe.2013.10.667 [12] Yunmei Lu, Mingyuan Yan, Meng Han, Qingliang Yang, Yanqing Zhang. Privacy preserving feature selection and Multiclass Classification for horizontally distributed data. Mathematical Foundations of Computing, 2018, 1 (4) : 331-348. doi: 10.3934/mfc.2018016 [13] Jelena Grbić, Jie Wu, Kelin Xia, Guo-Wei Wei. Aspects of topological approaches for data science. Foundations of Data Science, 2022, 4 (2) : 165-216. doi: 10.3934/fods.2022002 [14] Evariste Sanchez-Palencia, Jean-Pierre Françoise. Topological remarks and new examples of persistence of diversity in biological dynamics. Discrete and Continuous Dynamical Systems - S, 2019, 12 (6) : 1775-1789. doi: 10.3934/dcdss.2019117 [15] Frédéric Grognard, Frédéric Mazenc, Alain Rapaport. Polytopic Lyapunov functions for persistence analysis of competing species. Discrete and Continuous Dynamical Systems - B, 2007, 8 (1) : 73-93. doi: 10.3934/dcdsb.2007.8.73 [16] Audun D. Myers, Firas A. Khasawneh, Brittany T. Fasy. ANAPT: Additive noise analysis for persistence thresholding. Foundations of Data Science, 2022, 4 (2) : 243-269. doi: 10.3934/fods.2022005 [17] Jiangchuan Fan, Xinyu Guo, Jianjun Du, Weiliang Wen, Xianju Lu, Brahmani Louiza. Analysis of the clustering fusion algorithm for multi-band color image. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1233-1249. doi: 10.3934/dcdss.2019085 [18] Junying Hu, Xiaofei Qian, Jun Pei, Changchun Tan, Panos M. Pardalos, Xinbao Liu. A novel quality prediction method based on feature selection considering high dimensional product quality data. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021099 [19] Jian Zhao, Fang Deng, Jian Jia, Chunmeng Wu, Haibo Li, Yuan Shi, Shunli Zhang. A new face feature point matrix based on geometric features and illumination models for facial attraction analysis. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1065-1072. doi: 10.3934/dcdss.2019073 [20] Daniel Roggen, Martin Wirz, Gerhard Tröster, Dirk Helbing. Recognition of crowd behavior from mobile sensors with pattern analysis and graph clustering methods. Networks and Heterogeneous Media, 2011, 6 (3) : 521-544. doi: 10.3934/nhm.2011.6.521

Impact Factor: