Article Contents
Article Contents

# ToFU: Topology functional units for deep learning

• * Corresponding authors: Christopher Oballe and Vasileios Maroulas
• We propose ToFU, a new trainable neural network unit with a persistence diagram dissimilarity function as its activation. Since persistence diagrams are topological summaries of structures, this new activation measures and learns the topology of data to leverage it in machine learning tasks. We showcase the utility of ToFU in two experiments: one involving the classification of discrete-time autoregressive signals, and another involving a variational autoencoder. In the former, ToFU yields competitive results with networks that use spectral features while outperforming CNN architectures. In the latter, ToFU produces topologically-interpretable latent space representations of inputs without sacrificing reconstruction fidelity.

Mathematics Subject Classification: Primary: 55N31, 68T07.

 Citation:

• Figure 1.  Here, we show a lower star filtration of image data. We model the image as a function $f$ on a cubical complex $\mathcal{K}$ whose constituent elementary cubes correspond to pixels. The value $f(Q)$ of a pixel $Q$ is its intensity. At the beginning of the filtration, $\mathcal{K}_0$, there are no cubes present. The four cubes with the lowest intensity appear in $\mathcal{K}_1$. During this time in the filtration, there is also a $1$-cycle directly in the center of the image. Since this $1$-cycle is not the boundary of any $2$-chains, it corresponds to a $1$-dimensional homological feature. More cubes are added in $\mathcal{K}_2$ and $\mathcal{K}_3$. Finally, the last cube is added at $\mathcal{K}_4$ and the $1$-dimensional feature that appeared at $\mathcal{K}_1$ is annihilated

Figure 2.  The minimal cost matching (Equation (10)) for two PDs, $\mathcal{D}$ and $\mathcal{D}'$

Figure 3.  One-point PDs learned by ToFU for classification. The learned diagrams are color coded by accuracy. Example PDs from both classes in the classification task are also shown. Class 1 consisted of two types of PDs, depicted as upright and inverted triangles, respectively. Learned PDs corresponding to high classification accuracy fell into two groups– those with birth times earlier and later than points in Class 1 and Class 2, respectively

Figure 4.  Gradient descent with ToFU layer using Equation (15). Because the gradient depends on a minimal$-$cost matching, initialization of weights determines the solution when there are fewer learnable points than those in data

Figure 5.  Gradient descent with ToFU layer that has two learnable points. Different initializations are shown in (a) and (b)

Figure 6.  Gradient descent to minimize the average of Equation (10) for two PDs using a ToFU layer that has three learnable points. Different initializations are shown in (a) and (b)

Figure 7.  A schematic of the encoder in our topological VAE architecture. Here, we choose C = 1 in $\phi$ for simplicity

Figure 8.  Examples from the AR signals dataset. Here, we show signals with damping factor $\beta$ = 4. The average log PSD for each class, estimated by averaging periodograms, is shown in the second column

Figure 9.  Shown above is a single example from each of the six classes in our synthetic dataset. We apply a nonlinear transformation to pixel values for visual clarity

Figure 10.  Latent space representations of the (a) typical VAE and (b) ToFU-VAE. The ToFU-VAE latent space shows clear separation based on the topology of each class

Figure 11.  A cubical complex, $\mathcal{K}$ that we use to demonstrate homological computations. The $0$-, $1$-, and $2$-dimensional cubes in $\mathcal{K}$ are labelled as $\{v_1, v_2, v_3, v_4, v_5, v_6\}, \{e_1, e_2, e_3, e_4, e_5, e_6, e_7\},$ and $\{f_1\}$, respectively

Table 1.  The ANN architecture we used for classification of simulated PDs

 Layer Description 1 ToFU. 1 unit. 1 learnable point. 2 Dense. 32 units. ReLU activations. 3 Dense. 16 units. ReLU activations. 4 Dense. 8 units. ReLU activations. 5 Dense. 1 unit. Sigmoid activation.

Table 2.  Test accuracies for each ANN trained for AR signal classification. Conv1 and Conv2 refer to the 8 and 64 channel networks, respectively, while PLs and PIs refer to the networks trained on persistence landscapes and persistence images

 Model Test Accuracy Welch 98.91 ToFU 98.12 PLs 96.41 PIs 95.94 Conv1 92.66 Conv2 88.12

Table 3.  Test reconstruction errors for both VAEs

 ANN Test Recon. Err. Typical VAE 0.0847 ToFU-VAE 0.0806
•  [1] H. Adams, T. Emerson, M. Kirby, R. Neville and C. Peterson, et al., Persistence images: A stable vector representation of persistent homology, J. Mach. Learn. Res., 18 (2017), 35pp. [2] R. J. Adler and S. Agami, Modelling persistence diagrams with planar point processes, and revealing topology with bagplots, J. Appl. Comput. Topol., 3 (2019), 139-183.  doi: 10.1007/s41468-019-00035-w. [3] J.-B. Bardin, G. Spreemann and K. Hess, Topological exploration of artificial neuronal network dynamics, Network Neuroscience, 3 (2019), 725-743.  doi: 10.1162/netn_a_00080. [4] E. Berry, Y.-C. Chen, J. Cisewski-Kehe and B. T. Fasy, Functional summaries of persistence diagrams, J. Appl. Comput. Topol., 4 (2020), 211-262.  doi: 10.1007/s41468-020-00048-w. [5] C. A. N. Biscio and J. Møller, The accumulated persistence function, a new useful functional summary statistic for topological data analysis, with a view to brain artery trees and spatial point process applications, J. Comput. Graph. Statist., 28 (2019), 671-681.  doi: 10.1080/10618600.2019.1573686. [6] R. Brüel-Gabrielsson, B. J. Nelson, A. Dwaraknath, P. Skraba, L. J. Guibas and G. Carlsson, A topology layer for machine learning, preprint, arXiv: 1905.12200. [7] P. Bubenik, Statistical topological data analysis using persistence landscapes, J. Mach. Learn. Res., 16 (2015), 77-102. [8] G. Buzsáki,  Rhythms of the Brain, Oxford University Press, Oxford, 2006.  doi: 10.1093/acprof:oso/9780195301069.001.0001. [9] G. Carlsson, Topology and data, Bull. Amer. Math. Soc. (N.S.), 46 (2009), 255-308.  doi: 10.1090/S0273-0979-09-01249-X. [10] M. Carriere, M. Cuturi and S. Oudot, Sliced Wasserstein kernel for persistence diagrams, International Conference on Machine Learning, PMLR, 2017,664–673. Preprint, arXiv: 1706.03358. [11] F. Chazal and V. Divol, The density of expected persistence diagrams and its kernel based estimation, J. Comput. Geom., 10 (2019), 127-153. [12] F. Chazal, B. T. Fasy, F. Lecci, A. Rinaldo and L. Wasserman, Stochastic convergence of persistence landscapes and silhouettes, in Computational Geometry (SoCG'14), ACM, New York, 2014,474–483. [13] A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous and Y. LeCun, The loss surfaces of multilayer networks, preprint, arXiv: 1412.0233. [14] W. Crawley-Boevey, Decomposition of pointwise finite-dimensional persistence modules, J. Algebra Appl., 14 (2015), 8pp. doi: 10.1142/S0219498815500668. [15] M. Cuturi and A. Doucet, Fast computation of Wasserstein barycenters, Proceedings of the 31st International Conference on Machine Learning, 32 (2014), 685-693. [16] D. S. Dummit and R. M. Foote, Abstract Algebra, 3$^{rd}$ edition, John Wiley & Sons, Inc., Hoboken, NJ, 2004. [17] H. Edelsbrunner and J. L. Harer, Computational Topology: An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069. [18] H. Edelsbrunner,  D. Letscher and  A. Zomorodian,  Topological persistence and simplification, 41st Annual Symposium on Foundations of Computer Science, IEEE Comput. Soc. Press, Los Alamitos, CA, 2000.  doi: 10.1109/SFCS.2000.892133. [19] B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan and A. Singh, Confidence sets for persistence diagrams, Ann. Statist., 42 (2014), 2301-2339.  doi: 10.1214/14-AOS1252. [20] P. J. Franaszczuk and K. J. Blinowska, Linear model of brain electrical activity-EEG as a superposition of damped oscillatory modes, Biological Cybernetics, 53 (1985), 19-25.  doi: 10.1007/BF00355687. [21] R. B. Gabrielsson and G. Carlsson, Exposition and interpretation of the topology of neural networks, 18th IEEE International Conference On Machine Learning And Applications (ICMLA), IEEE, Boca Raton, FL, 2019. doi: 10.1109/ICMLA.2019.00180. [22] R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc. (N.S.), 45 (2008), 61-75.  doi: 10.1090/S0273-0979-07-01191-3. [23] S. M. Gordon, P. J. Franaszczuk, W. D. Hairston, M. Vindiola and K. McDowell, Comparing parametric and nonparametric methods for detecting phase synchronization in EEG, J. Neuroscience Methods, 212 (2013), 247-258.  doi: 10.1016/j.jneumeth.2012.10.002. [24] K. Gurney,  An Introduction to Neural Networks, CRC Press, London, 1997.  doi: 10.1201/9781315273570. [25] W. H. Guss and R. Salakhutdinov, On characterizing the capacity of neural networks using algebraic topology, preprint, arXiv: 1802.04443. [26] T. Kaczynski, K. Mischaikow and M. Mrozek, Computational Homology, Applied Mathematical Sciences, 157, Springer-Verlag, New York, 2004. doi: 10.1007/b97315. [27] D. P. Kingma and M. Welling, An introduction to variational autoencoders, preprint, arXiv: 1906.02691. [28] H. W. Kuhn, The Hungarian method for the assignment problem, Naval Res. Logist. Quart., 2 (1955), 83-97.  doi: 10.1002/nav.3800020109. [29] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), 2278-2324.  doi: 10.1109/5.726791. [30] A. Marchese and V. Maroulas, Signal classification with a point process distance on the space of persistence diagrams, Adv. Data Anal. Classif., 12 (2018), 657-682.  doi: 10.1007/s11634-017-0294-x. [31] V. Maroulas, J. L. Mike and C. Oballe, Nonparametric estimation of probability density functions of random persistence diagrams., J. Mach. Learn. Res., 20 (2019), 49pp. [32] V. Maroulas, F. Nasrin and C. Oballe, A Bayesian framework for persistent homology, SIAM J. Math. Data Sci., 2 (2020), 48-74.  doi: 10.1137/19M1268719. [33] V. Maroulas, C. Putman Micucci and A. Spannaus, A stable cardinality distance for topological classification, Adv. Data Anal. Classif., 14 (2020), 611-628.  doi: 10.1007/s11634-019-00378-3. [34] F. Mémoli, The Gromov–Wasserstein distance: A brief overview, Axioms, 3 (2014), 335-341.  doi: 10.3390/axioms3030335. [35] Y. Mileyko, S. Mukherjee and J. Harer, Probability measures on the space of persistence diagrams, Inverse Problems, 27 (2011), 22pp. doi: 10.1088/0266-5611/27/12/124007. [36] A. Monod, S. Kališnik, J. Á. Patiño-Galindo and L. Crawford, Tropical sufficient statistics for persistent homology, SIAM J. Appl. Algebra Geom., 3 (2019), 337-371.  doi: 10.1137/17M1148037. [37] M. Moor, M. Horn, B. Rieck and K. Borgwardt, Topological autoencoders, preprint, arXiv: 1906.00722. [38] E. Munch, K. Turner, P. Bendich, S. Mukherjee, J. Mattingly and J. Harer, Probabilistic Fréchet means for time varying persistence diagrams, Electron. J. Statist., 9 (2015), 1173-1204.  doi: 10.1214/15-EJS1030. [39] A. V. Oppenheim, J. R. Buck and R. W. Schafer, Discrete-Time Signal Processing. Vol. 2, Prentice Hall, Upper Saddle River, NJ, 2001. [40] A. Poulenard, P. Skraba and M. Ovsjanikov, Topological function optimization for continuous shape matching, Computer Graphics Forum, 37 (2018), 13-25.  doi: 10.1111/cgf.13487. [41] S. Shalev-Shwartz and  S. Ben-David,  Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014.  doi: 10.1017/CBO9781107298019. [42] P. Skraba and K. Turner, Wasserstein stability for persistence diagrams, preprint, arXiv: 2006.16824. [43] O. Solomon Jr., PSD computations using Welch's method, Technical report, Office of Scientific and Technical Information, Department of Energy, 1991. doi: 10.2172/5688766. [44] I. Tolstikhin, O. Bousquet, S. Gelly and B. Schölkopf, Wasserstein auto-encoders, preprint, arXiv: 1711.01558. [45] K. Turner, Y. Mileyko, S. Mukherjee and J. Harer, Fréchet means for distributions of persistence diagrams, Discrete Comput. Geom., 52 (2014), 44-70.  doi: 10.1007/s00454-014-9604-7. [46] K. Turner, S. Mukherjee and D. M. Boyer, Persistent homology transform for modeling shapes and surfaces, Inf. Inference, 3 (2014), 310-344.  doi: 10.1093/imaiai/iau011. [47] H. Wagner, C. Chen and E. Vuçini, Efficient computation of persistent homology for cubical data, in Topological Methods in Data Analysis and Visualization II, Math. Vis., Springer, Heidelberg, 2012, 91–106. doi: 10.1007/978-3-642-23175-9_7. [48] P. J. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, John Wiley & Sons, 1994. [49] B. Zieliński, M. Lipiński, M. Juda, M. Zeppelzauer and P. Dłotko, Persistence codebooks for topological data analysis, Artificial Intelligence Review, 54 (2021), 1969-2009.  doi: 10.1007/s10462-020-09897-4. [50] A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249-274.  doi: 10.1007/s00454-004-1146-y.

Figures(11)

Tables(3)