# American Institute of Mathematical Sciences

• Previous Article
Fast non-convex low-rank matrix decomposition for separation of potential field data using minimal memory
• IPI Home
• This Issue
• Next Article
Adversarial defense via the data-dependent activation, total variation minimization, and adversarial training
February  2021, 15(1): 147-158. doi: 10.3934/ipi.2020045

## A new initialization method based on normed statistical spaces in deep networks

 1 Department of Mathematics, Yeung Kin Man Academic Building, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong, China 2 Department of Mathematics, School of Science, Shanghai University, Shanghai 200444, China 3 HISILICON Technologies Co., Ltd., Huawei Base, Bantian, Longgang District, Shenzhen 518129, China 4 Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China

*Corresponding author: Tieyong Zeng (zeng@math.cuhk.edu.hk)

Received  November 2019 Revised  April 2020 Published  August 2020

Fund Project: Raymond Chan's research is supported by HKRGC Grants No. CUHK 14306316 and CUHK 14301718, CityU Grant 9380101, CRF Grant C1007-15G, AoE/M-05/12. Tieyong Zeng's research is supported by National Science Foundation of China No. 11671002, CUHK start-up and CUHK DAG 4053342, RGC 14300219, and NSFC/RGC N_CUHK 415/19

Training deep neural networks can be difficult. For classical neural networks, the initialization method by Xavier and Yoshua which is later generalized by He, Zhang, Ren and Sun can facilitate stable training. However, with the recent development of new layer types, we find that the above mentioned initialization methods may fail to lead to successful training. Based on these two methods, we will propose a new initialization by studying the parameter space of a network. Our principal is to put constrains on the growth of parameters in different layers in a consistent way. In order to do so, we introduce a norm to the parameter space and use this norm to measure the growth of parameters. Our new method is suitable for a wide range of layer types, especially for layers with parameter-sharing weight matrices.

Citation: Hongfei Yang, Xiaofeng Ding, Raymond Chan, Hui Hu, Yaxin Peng, Tieyong Zeng. A new initialization method based on normed statistical spaces in deep networks. Inverse Problems & Imaging, 2021, 15 (1) : 147-158. doi: 10.3934/ipi.2020045
##### References:

show all references

##### References:
(a) Plot of losses of network summarized in Table 1. (b) Plot of losses of network summarized in Table 2. (c) Plot of losses of network summarized in Table 3. (d) Plot of losses of network summarized in Table 4. Mean and std for the last of the smoothed loss values: Ours (a) $0.070\pm 0.005$, (b) $0.111\pm 0.006$, (c) $0.088\pm 0.003$, (d) $0.083\pm 0.004$; Xavier/He (a) $0.069\pm 0.001$, (b) $0.206\pm 0.012$, (c) $0.221\pm 0.016$, (d) $0.164\pm 0.012$. We also tested the evaluation accuracies on the test set with results: Ours versus Xavier/He (a) $98.06\%$, $98.13\%$, (b) $95.66\%$, $93.94\%$, (c) $97.17\%$, $95.07\%$, (d) $98.01\%$, $96.22\%$
Network structure of Figure 1(b). For the last convolution layer with kernel size $55\times 55$ we use periodic padding on the input images to make sure the conditions on $T$ in (6) are satisfied
 Layer Output channel Number of parameters Conv2d+Maxpool $32$ $3\times 3 \times 1 \times 32$ Conv2d+Maxpool $64$ $3\times 3 \times 32\times 64$ Reshape to $56\times 56$ Conv2d $1$ $55\times 55 \times 1\times 1$ Fc $10$ $3136\times 10$
 Layer Output channel Number of parameters Conv2d+Maxpool $32$ $3\times 3 \times 1 \times 32$ Conv2d+Maxpool $64$ $3\times 3 \times 32\times 64$ Reshape to $56\times 56$ Conv2d $1$ $55\times 55 \times 1\times 1$ Fc $10$ $3136\times 10$
Network structure of Figure 1(a)
 Layer Output channel Number of parameters Conv2d+MaxPool $32$ $3\times 3 \times 1 \times 32$ Conv2d+MaxPool $64$ $3\times 3 \times 32\times 64$ Fc $64$ $3136\times 64$ Fc $10$ $64\times 10$
 Layer Output channel Number of parameters Conv2d+MaxPool $32$ $3\times 3 \times 1 \times 32$ Conv2d+MaxPool $64$ $3\times 3 \times 32\times 64$ Fc $64$ $3136\times 64$ Fc $10$ $64\times 10$
Network structure of Figure 1(c). For a compression ratio $B> 1$, we use circulant implementation with block size $B$. The number of parameters for a CirCNN implementation should be divided by $B$
 Layer Out channel Number of parameters Compression ratio Conv2d+Maxpool $32$ $3\times 3 \times 1 \times 32$ $1$ Conv2d+Maxpool $64$ $3\times 3 \times 32\times 64$ $1$ Fc $1568$ $3136\times 1568$ $1568$ Fc $10$ $1568\times 10$ $1$
 Layer Out channel Number of parameters Compression ratio Conv2d+Maxpool $32$ $3\times 3 \times 1 \times 32$ $1$ Conv2d+Maxpool $64$ $3\times 3 \times 32\times 64$ $1$ Fc $1568$ $3136\times 1568$ $1568$ Fc $10$ $1568\times 10$ $1$
Network structure of Figure 1(d). For a compression ratio $B> 1$, we use circulant implementation with block size $B$. The number of parameters for a CirCNN implementation should be divided by $B$
 Layer Out channel Number of parameters Compression ratio Conv2d+Maxpool $32$ $3\times 3 \times 1 \times 32$ $1$ Conv2d+Maxpool $64$ $3\times 3 \times 32\times 64$ $1$ Conv2d $256$ $3\times 3 \times 64 \times 256$ $1$ Conv2d $256$ $3\times 3 \times 256 \times 256$ $256$ Conv2d $256$ $3\times 3 \times 256 \times 256$ $256$ Fc $64$ $12544\times 64$ $1$ Fc $10$ $64\times 10$ $1$
 Layer Out channel Number of parameters Compression ratio Conv2d+Maxpool $32$ $3\times 3 \times 1 \times 32$ $1$ Conv2d+Maxpool $64$ $3\times 3 \times 32\times 64$ $1$ Conv2d $256$ $3\times 3 \times 64 \times 256$ $1$ Conv2d $256$ $3\times 3 \times 256 \times 256$ $256$ Conv2d $256$ $3\times 3 \times 256 \times 256$ $256$ Fc $64$ $12544\times 64$ $1$ Fc $10$ $64\times 10$ $1$
 [1] Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 321-332. doi: 10.3934/naco.2020028 [2] Rabiaa Ouahabi, Nasr-Eddine Hamri. Design of new scheme adaptive generalized hybrid projective synchronization for two different chaotic systems with uncertain parameters. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2361-2370. doi: 10.3934/dcdsb.2020182 [3] Zengyun Wang, Jinde Cao, Zuowei Cai, Lihong Huang. Finite-time stability of impulsive differential inclusion: Applications to discontinuous impulsive neural networks. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2677-2692. doi: 10.3934/dcdsb.2020200 [4] Rui Hu, Yuan Yuan. Stability, bifurcation analysis in a neural network model with delay and diffusion. Conference Publications, 2009, 2009 (Special) : 367-376. doi: 10.3934/proc.2009.2009.367 [5] Juan Manuel Pastor, Javier García-Algarra, Javier Galeano, José María Iriondo, José J. Ramasco. A simple and bounded model of population dynamics for mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 53-70. doi: 10.3934/nhm.2015.10.53 [6] Juliang Zhang, Jian Chen. Information sharing in a make-to-stock supply chain. Journal of Industrial & Management Optimization, 2014, 10 (4) : 1169-1189. doi: 10.3934/jimo.2014.10.1169 [7] Juan Manuel Pastor, Javier García-Algarra, José M. Iriondo, José J. Ramasco, Javier Galeano. Dragging in mutualistic networks. Networks & Heterogeneous Media, 2015, 10 (1) : 37-52. doi: 10.3934/nhm.2015.10.37 [8] Alessandro Gondolo, Fernando Guevara Vasquez. Characterization and synthesis of Rayleigh damped elastodynamic networks. Networks & Heterogeneous Media, 2014, 9 (2) : 299-314. doi: 10.3934/nhm.2014.9.299 [9] Jan Prüss, Laurent Pujo-Menjouet, G.F. Webb, Rico Zacher. Analysis of a model for the dynamics of prions. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 225-235. doi: 10.3934/dcdsb.2006.6.225 [10] Johannes Kellendonk, Lorenzo Sadun. Conjugacies of model sets. Discrete & Continuous Dynamical Systems - A, 2017, 37 (7) : 3805-3830. doi: 10.3934/dcds.2017161 [11] Didier Bresch, Thierry Colin, Emmanuel Grenier, Benjamin Ribba, Olivier Saut. A viscoelastic model for avascular tumor growth. Conference Publications, 2009, 2009 (Special) : 101-108. doi: 10.3934/proc.2009.2009.101 [12] Ondrej Budáč, Michael Herrmann, Barbara Niethammer, Andrej Spielmann. On a model for mass aggregation with maximal size. Kinetic & Related Models, 2011, 4 (2) : 427-439. doi: 10.3934/krm.2011.4.427 [13] Martin Bohner, Sabrina Streipert. Optimal harvesting policy for the Beverton--Holt model. Mathematical Biosciences & Engineering, 2016, 13 (4) : 673-695. doi: 10.3934/mbe.2016014 [14] Chin-Chin Wu. Existence of traveling wavefront for discrete bistable competition model. Discrete & Continuous Dynamical Systems - B, 2011, 16 (3) : 973-984. doi: 10.3934/dcdsb.2011.16.973 [15] Michael Grinfeld, Amy Novick-Cohen. Some remarks on stability for a phase field model with memory. Discrete & Continuous Dynamical Systems - A, 2006, 15 (4) : 1089-1117. doi: 10.3934/dcds.2006.15.1089 [16] Alba Málaga Sabogal, Serge Troubetzkoy. Minimality of the Ehrenfest wind-tree model. Journal of Modern Dynamics, 2016, 10: 209-228. doi: 10.3934/jmd.2016.10.209 [17] Seung-Yeal Ha, Jinwook Jung, Jeongho Kim, Jinyeong Park, Xiongtao Zhang. A mean-field limit of the particle swarmalator model. Kinetic & Related Models, , () : -. doi: 10.3934/krm.2021011 [18] Raghda A. M. Attia, Dumitru Baleanu, Dianchen Lu, Mostafa M. A. Khater, El-Sayed Ahmed. Computational and numerical simulations for the deoxyribonucleic acid (DNA) model. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021018 [19] Paula A. González-Parra, Sunmi Lee, Leticia Velázquez, Carlos Castillo-Chavez. A note on the use of optimal control on a discrete time model of influenza dynamics. Mathematical Biosciences & Engineering, 2011, 8 (1) : 183-197. doi: 10.3934/mbe.2011.8.183 [20] Martial Agueh, Reinhard Illner, Ashlin Richardson. Analysis and simulations of a refined flocking and swarming model of Cucker-Smale type. Kinetic & Related Models, 2011, 4 (1) : 1-16. doi: 10.3934/krm.2011.4.1

2019 Impact Factor: 1.373