Denoising Criterion for Variational Auto-Encoding Framework
Authors: Daniel Im Im, Sungjin Ahn, Roland Memisevic, Yoshua Bengio
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments We conducted empirical studies of DVAE under the denoising variational lower bound as discussed in Section . To assess whether adding a denoising criterion to the variational auto-encoding models enhance the performance or not, we tested on the denoising criterion on VAE and IWAE throughout the experiments. As mentioned in Section , since the choice of the corruption distribution is crucial, we compare on different corruption distributions of various noise levels. We consider two datasets, the binarized MNIST dataset and the Frey face dataset. |
| Researcher Affiliation | Academia | Montreal Institute for Learning Algorithms University of Montreal Montreal, QC, H3C 3J7 {imdaniel,ahnsungj,memisevr, findme }@iro.umontreal.ca |
| Pseudocode | No | The paper contains mathematical formulations and descriptions of procedures, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not provide any concrete statement or link regarding the availability of its source code. |
| Open Datasets | Yes | We consider two datasets, the binarized MNIST dataset and the Frey face dataset. The MNIST dataset contains 60,000 images for training and 10,000 images for test and each of the images is 28 28 pixels for handwritten digits from 0 to 9 (Le Cun et al. 1998). The Frey Face3 dataset consists of 2000 images of Brendan Frey s face. We split the images into 1572 training data, 295 validation data, and 200 test data. 3Available at http://www.cs.nyu.edu/ roweis/data.html. |
| Dataset Splits | Yes | Out of the 60,000 training examples, we used 10,000 examples as validation set to tune the hyper-parameters of our model. We split the images into 1572 training data, 295 validation data, and 200 test data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions ADAM as an optimization algorithm, but does not specify any software libraries or their version numbers used for implementation (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Throughout the experiments, we used the same neural network architectures for VAE and IWAE. Also, a single stochastic layer with 50 latent variables is used for both VAE and IWAE. For the generation network, we used a neural network of two hidden layers each of which has 200 units. For the inference network, we tested two architectures, one with a single hidden layer and the other with two hidden layers. We then used 200 hidden units for both of them. We used softplus activations for VAE and tanh activations for IWAE following the same configuration of the original papers of (Kingma and Welling 2014) and (Burda, Grosse, and Salakhutdinov 2015). For binarized MNIST, the last layer of the generative network was sigmoid and the usual crossentropy term was used. For the Frey Face dataset where the input value is real numbers, we used Gaussian stochastic units for the output layer of the generation network. For all our results, we ran 10-fold experiments. We optimized all our models with ADAM (Kingma and Ba 2014b). We set the batch size to 100 and the learning rate was selected from a discrete range chosen based on the validation set. We used 1 and 5 samples of z per update for VAE and 5 samples for IWAE. |