# American Institute of Mathematical Sciences

August  2021, 4(3): 145-165. doi: 10.3934/mfc.2021009

## Global-Affine and Local-Specific Generative Adversarial Network for semantic-guided image generation

 Qufu Normal University, Qufu, China

* Corresponding author: nijch@163.com

Received  September 2020 Revised  March 2021 Published  August 2021 Early access  June 2021

The recent progress in learning image feature representations has opened the way for tasks such as label-to-image or text-to-image synthesis. However, one particular challenge widely observed in existing methods is the difficulty of synthesizing fine-grained textures and small-scale instances. In this paper, we propose a novel Global-Affine and Local-Specific Generative Adversarial Network (GALS-GAN) to explicitly construct global semantic layouts and learn distinct instance-level features. To achieve this, we adopt the graph convolutional network to calculate the instance locations and spatial relationships from scene graphs, which allows our model to obtain the high-fidelity semantic layouts. Also, a local-specific generator, where we introduce the feature filtering mechanism to separately learn semantic maps for different categories, is utilized to disentangle and generate specific visual features. Moreover, we especially apply a weight map predictor to better combine the global and local pathways considering the highly complementary between these two generation sub-networks. Extensive experiments on the COCO-Stuff and Visual Genome datasets demonstrate the superior generation performance of our model against previous methods, our approach is more capable of capturing photo-realistic local characteristics and rendering small-sized entities with more details.

Citation: Susu Zhang, Jiancheng Ni, Lijun Hou, Zili Zhou, Jie Hou, Feng Gao. Global-Affine and Local-Specific Generative Adversarial Network for semantic-guided image generation. Mathematical Foundations of Computing, 2021, 4 (3) : 145-165. doi: 10.3934/mfc.2021009
##### References:

show all references

##### References:
Overview of the proposed GALS-GAN
Illustration of a single graph convolution layer
Architecture of the MLP
Inferring process of the mask predictor
Architecture of the local-specific generator
Architecture of the multi-scale discriminators
Images generated by different level generators
Qualitative examples generated by our GALS-GAN based on the COCO-Stuff dataset
Qualitative examples generated by our GALS-GAN based on the Visual Genome dataset
Qualitative comparison of different models
An example of manipulating the synthesized image
Example results of different image manipulation types
Ablation study of the global-affine generator
Ablation study of the local-specific generator
Statistics of COCO-Stuff and Visual Genome datasets
 datasets train val test categories max min COCO-Stuff 74121 1024 2048 171 8 3 Visual Genome 62565 5506 5088 178 30 3
 datasets train val test categories max min COCO-Stuff 74121 1024 2048 171 8 3 Visual Genome 62565 5506 5088 178 30 3
Quantitative comparison of images generated by different methods on the COCO-Stuff dataset
 Methods IS $\uparrow$ FID $\downarrow$ 64 $\times$ 64 128 $\times$ 128 64 $\times$ 64 128$\times$ 128 sg2im [10] 6.7$\pm$0.1 5.99$\pm$0.27 67.99 95.18 stacking-GANs [36] 9.1$\pm$0.20 12.01$\pm$0.40 50.94 39.78 PasteGAN [19] 9.2$\pm$0.32 - 42.30 - PasteGAN (GT layout) [19] 10.20$\pm$0.20 - 34.30 - ours 9.85$\pm$0.15 13.82$\pm$0.30 38.29 29.62
 Methods IS $\uparrow$ FID $\downarrow$ 64 $\times$ 64 128 $\times$ 128 64 $\times$ 64 128$\times$ 128 sg2im [10] 6.7$\pm$0.1 5.99$\pm$0.27 67.99 95.18 stacking-GANs [36] 9.1$\pm$0.20 12.01$\pm$0.40 50.94 39.78 PasteGAN [19] 9.2$\pm$0.32 - 42.30 - PasteGAN (GT layout) [19] 10.20$\pm$0.20 - 34.30 - ours 9.85$\pm$0.15 13.82$\pm$0.30 38.29 29.62
Quantitative comparison of images generated by different methods on Visual Genome dataset
 Methods IS $\uparrow$ FID $\downarrow$ 64 $\times$ 64 128 $\times$ 128 64 $\times$ 64 128$\times$ 128 sg2im [10] 5.5$\pm$0.10 4.78$\pm$0.15 73.79 70.40 stacking-GANs [36] 6.90$\pm$0.20 9.24$\pm$0.41 59.53 50.19 PasteGAN [19] 7.97$\pm$0.30 - 58.37 - PasteGAN (GT layout) [19] 9.15$\pm$0.20 - 34.91 - ours 8.87$\pm$0.15 11.20$\pm$0.55 39.25 29.94
 Methods IS $\uparrow$ FID $\downarrow$ 64 $\times$ 64 128 $\times$ 128 64 $\times$ 64 128$\times$ 128 sg2im [10] 5.5$\pm$0.10 4.78$\pm$0.15 73.79 70.40 stacking-GANs [36] 6.90$\pm$0.20 9.24$\pm$0.41 59.53 50.19 PasteGAN [19] 7.97$\pm$0.30 - 58.37 - PasteGAN (GT layout) [19] 9.15$\pm$0.20 - 34.91 - ours 8.87$\pm$0.15 11.20$\pm$0.55 39.25 29.94
Comparison of classification accuracy
 Methods Classification Accuracy Score COCO-Stuff Visual Genome 64 $\times$ 64 128 $\times$ 128 64 $\times$ 64 128$\times$ 128 sg2im [10] 28.8 24.1 26.7 23.4 stacking-GANs [36] 33.9 31.2 32.7 30.3 PasteGAN [19] 40.3 - 38.7 - ours 46.1 44.6 45.4 43.5
 Methods Classification Accuracy Score COCO-Stuff Visual Genome 64 $\times$ 64 128 $\times$ 128 64 $\times$ 64 128$\times$ 128 sg2im [10] 28.8 24.1 26.7 23.4 stacking-GANs [36] 33.9 31.2 32.7 30.3 PasteGAN [19] 40.3 - 38.7 - ours 46.1 44.6 45.4 43.5
Quantitative comparison of predicted semantic layouts
 Methods R@0.3 R@0.5 COCO-Stuff Visual Genome COCO-Stuff Visual Genome sg2im [10] 52.4 21.9 32.2 10.6 stacking-GANs [36] 65.3 35.0 49.1 23.2 PasteGAN [19] 71.2 45.2 62.4 33.8 ours 80.7 48.4 66.2 36.5
 Methods R@0.3 R@0.5 COCO-Stuff Visual Genome COCO-Stuff Visual Genome sg2im [10] 52.4 21.9 32.2 10.6 stacking-GANs [36] 65.3 35.0 49.1 23.2 PasteGAN [19] 71.2 45.2 62.4 33.8 ours 80.7 48.4 66.2 36.5
Ablation study of GALS-GAN different architectures
 Architectures IS $\uparrow$ FID $\downarrow$ w/o $G_{g-a}$ 7.52$\pm$0.40 78.94 w/o $G_{l-s}$ 11.30$\pm$0.12 46.83 full model 13.82$\pm$0.30 29.62
 Architectures IS $\uparrow$ FID $\downarrow$ w/o $G_{g-a}$ 7.52$\pm$0.40 78.94 w/o $G_{l-s}$ 11.30$\pm$0.12 46.83 full model 13.82$\pm$0.30 29.62
 [1] Liu Hui, Lin Zhi, Waqas Ahmad. Network(graph) data research in the coordinate system. Mathematical Foundations of Computing, 2018, 1 (1) : 1-10. doi: 10.3934/mfc.2018001 [2] Deena Schmidt, Janet Best, Mark S. Blumberg. Random graph and stochastic process contributions to network dynamics. Conference Publications, 2011, 2011 (Special) : 1279-1288. doi: 10.3934/proc.2011.2011.1279 [3] Rajendra K C Khatri, Brendan J Caseria, Yifei Lou, Guanghua Xiao, Yan Cao. Automatic extraction of cell nuclei using dilated convolutional network. Inverse Problems and Imaging, 2021, 15 (1) : 27-40. doi: 10.3934/ipi.2020049 [4] Yang Mi, Kang Zheng, Song Wang. Homography estimation along short videos by recurrent convolutional regression network. Mathematical Foundations of Computing, 2020, 3 (2) : 125-140. doi: 10.3934/mfc.2020014 [5] Zhuwei Qin, Fuxun Yu, Chenchen Liu, Xiang Chen. How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods. Mathematical Foundations of Computing, 2018, 1 (2) : 149-180. doi: 10.3934/mfc.2018008 [6] G. Calafiore, M.C. Campi. A learning theory approach to the construction of predictor models. Conference Publications, 2003, 2003 (Special) : 156-166. doi: 10.3934/proc.2003.2003.156 [7] Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005 [8] Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060 [9] Roberto De Leo, James A. Yorke. The graph of the logistic map is a tower. Discrete and Continuous Dynamical Systems, 2021, 41 (11) : 5243-5269. doi: 10.3934/dcds.2021075 [10] Xumei Zhang, Jiafeng Yuan, Bin Dan, Ronghua Sui, Wenbo Li. The evolution mechanism of the multi-value chain network ecosystem supported by the third-party platform. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021148 [11] Mingyuan Mao, Hewei Zhang, Simeng Li, Baochang Zhang. SEMANTIC-RTAB-MAP (SRM): A semantic SLAM system with CNNs on depth images. Mathematical Foundations of Computing, 2019, 2 (1) : 29-41. doi: 10.3934/mfc.2019003 [12] Yu-Hao Liang, Wan-Rou Wu, Jonq Juang. Fastest synchronized network and synchrony on the Julia set of complex-valued coupled map lattices. Discrete and Continuous Dynamical Systems - B, 2016, 21 (1) : 173-184. doi: 10.3934/dcdsb.2016.21.173 [13] Abdolhossein Sadrnia, Amirreza Payandeh Sani, Najme Roghani Langarudi. Sustainable closed-loop supply chain network optimization for construction machinery recovering. Journal of Industrial and Management Optimization, 2021, 17 (5) : 2389-2414. doi: 10.3934/jimo.2020074 [14] Ziang Long, Penghang Yin, Jack Xin. Global convergence and geometric characterization of slow to fast weight evolution in neural network training for classifying linearly non-separable data. Inverse Problems and Imaging, 2021, 15 (1) : 41-62. doi: 10.3934/ipi.2020077 [15] Jiangtao Mo, Liqun Qi, Zengxin Wei. A network simplex algorithm for simple manufacturing network model. Journal of Industrial and Management Optimization, 2005, 1 (2) : 251-273. doi: 10.3934/jimo.2005.1.251 [16] Konstantin Avrachenkov, Giovanni Neglia, Vikas Vikram Singh. Network formation games with teams. Journal of Dynamics and Games, 2016, 3 (4) : 303-318. doi: 10.3934/jdg.2016016 [17] Joanna Tyrcha, John Hertz. Network inference with hidden units. Mathematical Biosciences & Engineering, 2014, 11 (1) : 149-165. doi: 10.3934/mbe.2014.11.149 [18] T. S. Evans, A. D. K. Plato. Network rewiring models. Networks and Heterogeneous Media, 2008, 3 (2) : 221-238. doi: 10.3934/nhm.2008.3.221 [19] David J. Aldous. A stochastic complex network model. Electronic Research Announcements, 2003, 9: 152-161. [20] Pradeep Dubey, Rahul Garg, Bernard De Meyer. Competing for customers in a social network. Journal of Dynamics and Games, 2014, 1 (3) : 377-409. doi: 10.3934/jdg.2014.1.377

Impact Factor: