reproducibilityindex.ai

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Authors: Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris Maddison

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that this approach outperforms generic samplers in a number of difﬁcult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models.
Researcher Affiliation	Collaboration	1University of Toronto and Vector Institute 2Google Research, Brain Team.
Pseudocode	Yes	Algorithm 1 Gibbs With Gradients
Open Source Code	Yes	We describe some simple extensions in Appendix D and code to replicate our experiments is available here.
Open Datasets	Yes	We train an RBM with 500 hidden units on the MNIST dataset... We train Potts models on 2 large proteins: OPSD BOVIN, and CADH1 HUMAN... We train deep EBMs parameterized by Residual Networks... on small binary and continuous image datasets... Static MNIST, Dynamic MNIST, Omniglot, Caltech Silhouettes, Frey Faces, Histopathology.
Dataset Splits	No	The paper mentions using a 'test-set' for evaluation (Table 2) and states that 'Full experimental details can be found in Appendix I', but it does not explicitly describe training, validation, and test dataset splits with specific percentages, counts, or a clear methodology in the main text.
Hardware Specification	No	The paper discusses computational efficiency and cost (e.g., 'the run-time of GWG is most comparable to Gibbs-2') and mentions general 'compute budget' and 'GPU' in the context of deep EBMs, but it does not specify exact hardware components such as GPU models (e.g., NVIDIA A100), CPU models, or memory details used for experiments.
Software Dependencies	No	The paper mentions 'Tensorﬂow Probability' in a footnote, but does not provide a specific version number. No other key software components are listed with their version numbers.
Experiment Setup	No	The paper describes some training aspects (e.g., 'using contrastive divergence', 'PCD with a replay buffer') but defers 'Full experimental details' to appendices (e.g., 'Full experimental details can be found in Appendix I'). It does not provide specific hyperparameter values like learning rates, batch sizes, or epochs in the main text.