Advanced Search
Article Contents
Article Contents

Generative imaging and image processing via generative encoder

Abstract Full Text(HTML) Figure(14) / Table(4) Related Papers Cited by
  • This paper introduces a novel generative encoder (GE) framework for generative imaging and image processing tasks like image reconstruction, compression, denoising, inpainting, deblurring, and super-resolution. GE unifies the generative capacity of GANs and the stability of AEs in an optimization framework instead of stacking GANs and AEs into a single network or combining their loss functions as in existing literature. GE provides a novel approach to visualizing relationships between latent spaces and the data space. The GE framework is made up of a pre-training phase and a solving phase. In the former, a GAN with generator $ G $ capturing the data distribution of a given image set, and an AE network with encoder $ E $ that compresses images following the estimated distribution by $ G $ are trained separately, resulting in two latent representations of the data, denoted as the generative and encoding latent space respectively. In the solving phase, given noisy image $ x = \mathcal{P}(x^*) $, where $ x^* $ is the target unknown image, $ \mathcal{P} $ is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image $ x $ in the compressed domain, i.e., given $ m = E(x) $, the two latent spaces are unified via solving the optimization problem

    $ z^* = \underset{z}{\mathrm{argmin}} \|E(G(z))-m\|_2^2+\lambda\|z\|_2^2 $

    and the image $ x^* $ is recovered in a generative way via $ \hat{x}: = G(z^*)\approx x^* $, where $ \lambda>0 $ is a hyperparameter. The unification of the two spaces allows improved performance against corresponding GAN and AE networks while visualizing interesting properties in each latent space.

    Mathematics Subject Classification: Primary: 68U10; Secondary: 68T07.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Flow of training process in GE. Step 1 and 2 forms the pre-training phase, while the remaining form the solving phase

    Figure 2.  Reconstruction results on CelebA dataset

    Figure 3.  Reconstruction results on Digital Rock dataset

    Figure 4.  Reconstruction results on LSUN church dataset

    Figure 5.  Denoising results on CelebA dataset

    Figure 6.  Deblurring results on CelebA dataset

    Figure 7.  Super-resolution results on CelebA dataset

    Figure 8.  Inpainting results on CelebA dataset

    Figure 9.  Plot of log of average MSE based on number of iterations in the solving phase

    Figure 10.  Comparison of image reconstruction of detail region (red box) for original image (left). In order of comparison, from left to right, we have Original, GE, invertGAN, ConvAE

    Figure 11.  Comparison of image reconstruction of detail region (red box) for original image (left). In order of comparison, from top to bottom, we have Original, GE, invertGAN, ConvAE

    Figure 12.  Additional pore sample result on Digital Rock dataset

    Figure 13.  Missing spectacles sample results on CelebA dataset

    Figure 14.  Reconstruction results for $ 64\times 64\times 3 $ images in CelebA with GE using BEGAN instead of pGAN

    Table 1.  Structure of $ E $. The decoder $ DC $ is a mirror of $ E $ using conv_transpose and upsample

    layer type layer
    conv2d $ k=[3,3,3,f],s=[1,1],a=ReLU $
    maxpool2d $ k=[1,2,2,1],s=[2,2] $
    conv2d $ k=[3,3,f,2*f],s=[1,1],a=ReLU $
    maxpool2d $ k=[1,2,2,1],s=[2,2] $
    conv2d $ k=[3,3,2*f,4*f],s=[1,1],a=ReLU $
    maxpool2d $ k=[1,2,2,1],s=[2,2] $
    conv2d $ k=[3,3,4*f,8*f],s=[1,1],a=ReLU $
    maxpool2d $ k=[1,2,2,1],s=[2,2] $
    conv2d $ k=[3,3,8*f,16*f],s=[1,1],a=ReLU $
    maxpool2d $ k=[1,2,2,1],s=[2,2] $
    conv2d $ k=[3,3,16*f,32*f],s=[1,1],a=ReLU $
    maxpool2d $ k=[1,2,2,1],s=[2,2] $
    fullyconnected $ h=256 $
     | Show Table
    DownLoad: CSV

    Table 2.  Quantitative results comparing models for CelebA. Additionally, some FID scores reported by recent GAN papers that used CelebA $ 128\times128\times3 $ images are also presented for comparison, labelled with *

    Model MSE SSIM FID
    CRGAN* 16.97
    SSGAN* 24.36
    Our pGAN 22.13
    ConvAE 0.03386 0.6823$ \pm $0.051 87.71
    AEGAN 0.03317 0.6907$ \pm $0.050 34.53
    invertGAN 0.03529 0.7203$ \pm $0.038 19.19
    GE 0.03262 0.7329$ \pm $0.025 17.42
     | Show Table
    DownLoad: CSV

    Table 3.  Quantitative results comparing models for digital rocks. The number in brackets show the size of the latent vector in pGAN that the model is trained on. Models with same latent sizes are solved with the same pGAN weights. The same AE is used for all models

    Model MSE PSNR
    ConvAE 0.009271 20.32
    invertGAN (512) 0.008185 20.86
    GE (512) 0.007470 21.26
    GE (256) 0.007741 21.11
    GE (128) 0.007839 21.05
    GE (64) 0.008499 20.70
     | Show Table
    DownLoad: CSV

    Table 4.  Results of invertGAN, GE on spectacles. T refers to samples which produced spectacles, F refers to samples which did not. Remaining are invalid reconstructions

    InvertGAN, F InvertGAN, T
    GE, F 289 32
    GE, T 157 469
     | Show Table
    DownLoad: CSV
  • [1] M. AharonM. Elad and A. Bruckstein, K-svd: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Transactions on Signal Processing, 54 (2006), 4311-4322. 
    [2] M. Arjovsky, S. Chintala and L. Bottou, Wasserstein generative adversarial networks, In Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 70 (2017), 214–223, http://proceedings.mlr.press/v70/arjovsky17a.html.
    [3] D. Bau, J.-Y. Zhu, J. Wulff, W. Peebles, H. Strobelt, B. Zhou and A. Torralba, Seeing what a GAN cannot generate, arXiv: 1910.11626 doi: 10.1109/ICCV.2019.00460.
    [4] D. Berthelot, T. Schumm and L. Metz, BEGAN: Boundary equilibrium generative adversarial networks, Computer Science, http://arXiv.org/abs/1703.10717.
    [5] A. BoraA. JalalE. Price and A. G. Dimakis, Compressed sensing using generative models, ICML'17 Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 537-546. 
    [6] C. Bowles, L. J. Chen, R. Guerrero, P. Bentley, R. N. Gunn, A. Hammers, D. A. Dickie, M. del C. Valdés Hernández, J. M. Wardlaw and D. Rueckert, Gan augmentation: Augmenting training data using generative adversarial networks, arXiv: 1810.10863.
    [7] A. BuadesB. Coll and J. .Morel, A non-local algorithm for image denoising, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2 (2005), 60-65. 
    [8] E. J. CandesJ. Romberg and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory, 52 (2006), 489-509.  doi: 10.1109/TIT.2005.862083.
    [9] J. Chen, J. Chen, H. Chao and M. Yang, Image blind denoising with generative adversarial network based noise modeling, In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/CVPR.2018.00333.
    [10] T. Chen, X. Zhai, M. Ritter, M. Lucic and N. Houlsby, Self-supervised gans via auxiliary rotation loss, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 12146–12155. doi: 10.1109/CVPR.2019.01243.
    [11] A. Creswell and A. A. Bharath, Inverting the generator of A generative adversarial network, IEEE Transactions on Neural Networks and Learning Systems, 30 (2019), http://arXiv.org/abs/1611.05644. doi: 10.1109/TNNLS.2018.2875194.
    [12] K. Dabov, A. Foi, V. Katkovnik and K. Egiazarian, Bm3d image denoising with shape-adaptive principal component analysis, Proc. Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS'09).
    [13] D. Ulyanov, A. Vedaldi and V. Lempitsky, Deep image prior, arXiv: 1711.10925.
    [14] J. Donahue, P. Krähenbühl and T. Darrell, Adversarial feature learning, Computer Science, http://arXiv.org/abs/1605.09782.
    [15] W. DongL. ZhangG. Shi and X. Wu, Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization, IEEE Trans. Image Process., 20 (2011), 1838-1857.  doi: 10.1109/TIP.2011.2108306.
    [16] D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), 1289-1306.  doi: 10.1109/TIT.2006.871582.
    [17] V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro and A. C. Courville, Adversarially learned inference, arXiv: 1606.00704.
    [18] P. Getreuer, Total variation inpainting using split bregman, Image Processing On Line, 2 (2012), 147-157. 
    [19] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems, 27 (NIPS 2014).
    [20] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, G. Klambauer and S. Hochreiter, Gans trained by a two time-scale update rule converge to a nash equilibrium, Computer Science, http://arXiv.org/abs/1706.08500.
    [21] S.-W. Huang, C.-T. Lin, S.-P. Chen, Y.-Y. Wu, P.-H. Hsu and S.-H. Lai, Auggan: Cross domain adaptation with gan-based data augmentation, In Computer Vision – ECCV 2018, (eds. V. Ferrari, M. Hebert, C. Sminchisescu and Y. Weiss), Springer International Publishing, Cham, (2018), 731–744.
    [22] T. Karras, T. Aila, S. Laine and J. Lehtinen, Progressive growing of gans for improved quality, stability, and variation, Computer Science, http://arXiv.org/abs/1710.10196.
    [23] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Published as A Conference Paper at the 3rd International Conference for Learning Representations, San Diego, 2015, 2014, http://arXiv.org/abs/1412.6980.
    [24] D. P. Kingma and M. Welling, Auto-encoding variational bayes, 2013.
    [25] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin and J. Matas, Deblurgan: Blind motion deblurring using conditional adversarial networks, In IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 8183–8192.
    [26] A. B. L. Larsen, S. K. Sønderby and O. Winther, Autoencoding beyond pixels using a learned similarity metric, Computer Science, http://arXiv.org/abs/1512.09300.
    [27] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang and W. Shi, Photo-realistic single image super-resolution using a generative adversarial network, (2017), 105–114.
    [28] Q. Lei, A. Jalal, I. S. Dhillon and A. G. Dimakis, Inverting deep generative models, one layer at a time, Computer Science, 2019, http://arXiv.org/abs/1906.07437.
    [29] X. Liang, H. Zhang and E. P. Xing, Generative semantic manipulation with mask-contrasting GAN, Lecture Notes in Computer Science, 11217 (2018), 574–590, http://arXiv.org/abs/1708.00315. doi: 10.1007/978-3-030-01261-8_34.
    [30] Z. C. Lipton and S. Tripathi, Precise recovery of latent vectors from generative adversarial networks, Computer Science, http://arXiv.org/abs/1702.04782.
    [31] Z. Liu, P. Luo, X. Wang and X. Tang, Deep learning face attributes in the wild, In IEEE International Conference on Computer Vision (ICCV), 2015. doi: 10.1109/ICCV.2015.425.
    [32] S. Menon, A. Damian, S. Hu, N. Ravi and C. Rudin, Pulse: Self-supervised photo upsampling via latent space exploration of generative models, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2002), 2434–2442.
    [33] A. Nguyen, J. Yosinski, Y. Bengio, A. Dosovitskiy and J. Clune, Plug & play generative networks: Conditional iterative generation of images in latent space, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, http://arXiv.org/abs/1612.00005. doi: 10.1109/CVPR.2017.374.
    [34] D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell and A. Efros, Context encoders: Feature learning by inpainting, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/CVPR.2016.278.
    [35] A. Radford, L. Metz and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, In ICLR, 2016.
    [36] T. Ramstad, Bentheimer micro-ct with waterflood, 2018, http://www.digitalrocksportal.org/projects/172.
    [37] M. Rosca, B. Lakshminarayanan, D. Warde-Farley and S. Mohamed, Variational approaches for auto-encoding generative adversarial networks, arXiv: 1706.04987.
    [38] L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.  doi: 10.1016/0167-2789(92)90242-F.
    [39] J. SchlemperJ. CaballeroJ. V. HajnalA. Price and D. Rueckert, A deep cascade of convolutional neural networks for mr image reconstruction, Information Processing in Medical Imaging, 10265 (2017), 647-658.  doi: 10.1007/978-3-319-59050-9_51.
    [40] V. Shah and C. Hegde, Solving linear inverse problems using gan priors: An algorithm with provable guarantees, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2018), 4609–4613. doi: 10.1109/ICASSP.2018.8462233.
    [41] Y. Shen, J. Gu, X. Tang and B. Zhou, Interpreting the latent space of gans for semantic face editing, Computer Science, http://arXiv.org/abs/1907.10786.
    [42] D. Ulyanov, A. Vedaldi and V. S. Lempitsky, Adversarial generator-encoder networks, Computer Science, http://arXiv.org/abs/1704.02304.
    [43] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., 11 (2010), 3371–3408, http://dl.acm.org/citation.cfm?id=1756006.1953039.
    [44] P. Vincent, H. Larochelle, Y. Bengio and P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, In ICML '08, (2008), 1096–1103. doi: 10.1145/1390156.1390294.
    [45] Z. WangA. BovikH. Sheikh and E. Simoncelli, Image quality assessment: From error visibility to structural similarity, Image Processing, IEEE Transactions on, 13 (2004), 600-612.  doi: 10.1109/TIP.2003.819861.
    [46] D. Warde-Farley and Y. Bengio, Improving generative adversarial networks with denoising feature matching, In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings, 2017, https://openreview.net/forum?id=S1X7nhsxl.
    [47] L. Xu and J. Jia, Two-phase kernel estimation for robust motion deblurring, Lecture Notes in Computer Science, 6311 (2010), 157-170.  doi: 10.1007/978-3-642-15549-9_12.
    [48] Q. Yan and W. Wang, DCGANsfor image super-resolution, denoising and debluring.,
    [49] R. Yan and L. Shao, Blind image blur estimation via deep learning, IEEE Trans Image Process, 25 (2016), 1910-1921. 
    [50] R. A. Yeh, C. Chen, T. Lim, A. G. Schwing, M. Hasegawa-Johnson and M. N. Do, Semantic image inpainting with deep generative models, In IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2017), Honolulu, HI, USA,, (2017), 6882–6890 doi: 10.1109/CVPR.2017.728.
    [51] F. Yu, Y. Zhang, S. Song, A. Seff, T. Funkhouser and J. Xiao, LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop, Computer Science, http://arXiv.org/abs/1506.03365.
    [52] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu and T. S. Huang, Generative image inpainting with contextual attention, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi: 10.1109/CVPR.2018.00577.
    [53] H. Zhang, Z. Zhang, A. Odena and H. Lee, Consistency regularization for generative adversarial networks, arXiv: 1910.12027.
    [54] J. ZhangD. Zhao and W. Gao, Group-based sparse representation for image restoration, IEEE Trans. Image Process., 23 (2014), 3336-3351.  doi: 10.1109/TIP.2014.2323127.
    [55] J. J. Zhao, M. Mathieu and Y. LeCun, Energy-based generative adversarial networks, In ICLR, 2017.
  • 加载中




Article Metrics

HTML views(987) PDF downloads(449) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint