TY - JOUR
T1 - Processing Simple Geometric Attributes with Autoencoders
AU - Newson, Alasdair
AU - Almansa, Andrés
AU - Gousseau, Yann
AU - Ladjal, Saïd
N1 - Publisher Copyright:
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/4/1
Y1 - 2020/4/1
N2 - Image synthesis is a core problem in modern deep learning, and many recent architectures such as autoencoders and generative adversarial networks produce spectacular results on highly complex data, such as images of faces or landscapes. While these results open up a wide range of new, advanced synthesis applications, there is also a severe lack of theoretical understanding of how these networks work. This results in a wide range of practical problems, such as difficulties in training, the tendency to sample images with little or no variability and generalization problems. In this paper, we propose to analyze the ability of the simplest generative network, the autoencoder, to encode and decode two simple geometric attributes: size and position. We believe that, in order to understand more complicated tasks, it is necessary to first understand how these networks process simple attributes. For the first property, we analyze the case of images of centered disks with variable radii. We explain how the autoencoder projects these images to and from a latent space of smallest possible dimension, a scalar. In particular, we describe both the encoding process and a closed-form solution to the decoding training problem in a network without biases and shows that during training, the network indeed finds this solution. We then investigate the best regularization approaches which yield networks that generalize well. For the second property, position, we look at the encoding and decoding of Dirac delta functions, also known as “one-hot” vectors. We describe a handcrafted filter that achieves encoding perfectly and show that the network naturally finds this filter during training. We also show experimentally that the decoding can be achieved if the dataset is sampled in an appropriate manner. We hope that the insights given here will provide better understanding of the precise mechanisms used by generative networks and will ultimately contribute to producing more robust and generalizable networks.
AB - Image synthesis is a core problem in modern deep learning, and many recent architectures such as autoencoders and generative adversarial networks produce spectacular results on highly complex data, such as images of faces or landscapes. While these results open up a wide range of new, advanced synthesis applications, there is also a severe lack of theoretical understanding of how these networks work. This results in a wide range of practical problems, such as difficulties in training, the tendency to sample images with little or no variability and generalization problems. In this paper, we propose to analyze the ability of the simplest generative network, the autoencoder, to encode and decode two simple geometric attributes: size and position. We believe that, in order to understand more complicated tasks, it is necessary to first understand how these networks process simple attributes. For the first property, we analyze the case of images of centered disks with variable radii. We explain how the autoencoder projects these images to and from a latent space of smallest possible dimension, a scalar. In particular, we describe both the encoding process and a closed-form solution to the decoding training problem in a network without biases and shows that during training, the network indeed finds this solution. We then investigate the best regularization approaches which yield networks that generalize well. For the second property, position, we look at the encoding and decoding of Dirac delta functions, also known as “one-hot” vectors. We describe a handcrafted filter that achieves encoding perfectly and show that the network naturally finds this filter during training. We also show experimentally that the decoding can be achieved if the dataset is sampled in an appropriate manner. We hope that the insights given here will provide better understanding of the precise mechanisms used by generative networks and will ultimately contribute to producing more robust and generalizable networks.
KW - Autoencoders
KW - Deep learning
KW - Generative models
KW - Image synthesis
U2 - 10.1007/s10851-019-00924-w
DO - 10.1007/s10851-019-00924-w
M3 - Article
AN - SCOPUS:85075125386
SN - 0924-9907
VL - 62
SP - 293
EP - 312
JO - Journal of Mathematical Imaging and Vision
JF - Journal of Mathematical Imaging and Vision
IS - 3
ER -