Generalized Doubly Reparameterized Gradient Estimators
Authors: Matthias Bauer, Andriy Mnih
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we empirically evaluate the hierarchical extension of DREGs and its generalization to GDREGs, and compare them to the naive IWAE gradient estimator (labelled as IWAE) as well as STL (Roeder et al., 2017). We evaluate the proposed DREGs and GDREGs estimators on several conditional and unconditional unsupervised learning problems and find that they outperform the regular IWAE estimator. |
| Researcher Affiliation | Industry | 1Deep Mind, London, UK. Correspondence to: Matthias Bauer <msbauer@deepmind.com>, Andriy Mnih <andriy@deepmind.com>. |
| Pseudocode | Yes | We provide full derivations and a discussion of this special case in App. H as well as an example implementation in terms of (pseudo-)code in App. F. ... In Listing 1 we provide a commented example of how to implement the GDREGs estimator for the cross-entropy objective given in Eq. (21) using JAX. |
| Open Source Code | Yes | We provide full derivations and a discussion of this special case in App. H as well as an example implementation in terms of (pseudo-)code in App. F. ... In Listing 1 we provide a commented example of how to implement the GDREGs estimator for the cross-entropy objective given in Eq. (21) using JAX. |
| Open Datasets | Yes | In the remainder of this paper we consider image modelling tasks with VAEs on several standard benchmark datasets: MNIST (Le Cun & Cortes, 2010), Omniglot (Lake et al., 2015), and Fashion MNIST (Xiao et al., 2017). |
| Dataset Splits | No | The paper states 'We split the data into training and test sets as in previous work' but does not provide specific percentages for training, validation, or test sets, nor does it explicitly mention a validation split. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions using automatic differentiation frameworks like TensorFlow and JAX, but does not provide specific version numbers for these software dependencies or any other relevant libraries. |
| Experiment Setup | Yes | Unless stated otherwise, we train all models for 1000 epochs using the Adam optimizer (Kingma & Ba, 2015) with default learning rate of 3 10 4, a batch size of 64, and K = 64 importance samples; see App. G for details. In Appendix G, it is further stated: 'We use the Adam optimizer with a learning rate of 3e-4, β1 = 0.9, β2 = 0.999, and ϵ = 1e-7. All latent spaces have 50 dimensions. Every conditional distribution in Eq. (23) is parameterized by an MLP with two hidden layers of 300 tanh units each.' |