reproducibilityindex.ai

Generalized Doubly Reparameterized Gradient Estimators

Authors: Matthias Bauer, Andriy Mnih

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we empirically evaluate the hierarchical extension of DREGs and its generalization to GDREGs, and compare them to the naive IWAE gradient estimator (labelled as IWAE) as well as STL (Roeder et al., 2017). We evaluate the proposed DREGs and GDREGs estimators on several conditional and unconditional unsupervised learning problems and ﬁnd that they outperform the regular IWAE estimator.
Researcher Affiliation	Industry	1Deep Mind, London, UK. Correspondence to: Matthias Bauer <msbauer@deepmind.com>, Andriy Mnih <andriy@deepmind.com>.
Pseudocode	Yes	We provide full derivations and a discussion of this special case in App. H as well as an example implementation in terms of (pseudo-)code in App. F. ... In Listing 1 we provide a commented example of how to implement the GDREGs estimator for the cross-entropy objective given in Eq. (21) using JAX.
Open Source Code	Yes	We provide full derivations and a discussion of this special case in App. H as well as an example implementation in terms of (pseudo-)code in App. F. ... In Listing 1 we provide a commented example of how to implement the GDREGs estimator for the cross-entropy objective given in Eq. (21) using JAX.
Open Datasets	Yes	In the remainder of this paper we consider image modelling tasks with VAEs on several standard benchmark datasets: MNIST (Le Cun & Cortes, 2010), Omniglot (Lake et al., 2015), and Fashion MNIST (Xiao et al., 2017).
Dataset Splits	No	The paper states 'We split the data into training and test sets as in previous work' but does not provide specific percentages for training, validation, or test sets, nor does it explicitly mention a validation split.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions using automatic differentiation frameworks like TensorFlow and JAX, but does not provide specific version numbers for these software dependencies or any other relevant libraries.
Experiment Setup	Yes	Unless stated otherwise, we train all models for 1000 epochs using the Adam optimizer (Kingma & Ba, 2015) with default learning rate of 3 10 4, a batch size of 64, and K = 64 importance samples; see App. G for details. In Appendix G, it is further stated: 'We use the Adam optimizer with a learning rate of 3e-4, β1 = 0.9, β2 = 0.999, and ϵ = 1e-7. All latent spaces have 50 dimensions. Every conditional distribution in Eq. (23) is parameterized by an MLP with two hidden layers of 300 tanh units each.'