This post is a visualization page for my recent on generative modelling, Density estimation using Real NVP, done in collaboration with Google Brain that I will present during the NIPS Deep Learning Symposium 2016.
The generative procedure from the model is very similar to the one used in Variational Auto-Encoders and Generative Adversarial Networks: sample a vector from a simple distribution (here a Gaussian) and pass it through the generator network to obtain a sample . From the generated samples, it seems the model was able to capture the statistics from the original data distribution. For example, the samples are in general relatively sharp and coherent and therefore suggest that the models understands something more than mere correlation between neighboring pixels. This is due to not relying on fixed form reconstruction cost like squared loss on the data level. The models seems also to understand to some degree the notion of foreground/background, and volume, lighting and shadows.
The generator network has been built in the paper according to a convolutional architecture, making it relatively easy to reuse the model to generate bigger images. As the model is convolutional, the model is trying to generate a “texture” of the dataset rather than an upsampled version of the images it was trained on. This explains why the model is most successful when trained on background datasets like LSUN different subcategories. This sort of behaviour can also be observed in other models like Deep Convolutional Generative Adversarial Networks.
The inference model has been trained to reorganize an input data point into a set of feature vectors from simpler, local features to more complex and global features. Inspired by the paper Towards Conceptual Compression, we try to observe the influence of those feature vectors, we choose to see what happens when we replace those vectors with resampled version . For images datasets, from left to right:
We see that the degradation of the original image is very gradual and, depending on the dataset, the feature vector is able to retain important high level information about the image.
The model has been trained to reorganize the data distribution into a Gaussian one. Since this distribution respect some degree of smoothness, we choose to build something resembling a sphere in the latent space . This sphere will be built using eight examples and their latent representation which will be parametrized by three angles with equation:
We then remap this sphere into the input space using the generator network to obtain the figure in the last column.