Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Authors: Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris Maddison

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models.
Researcher Affiliation Collaboration 1University of Toronto and Vector Institute 2Google Research, Brain Team.
Pseudocode Yes Algorithm 1 Gibbs With Gradients
Open Source Code Yes We describe some simple extensions in Appendix D and code to replicate our experiments is available here.
Open Datasets Yes We train an RBM with 500 hidden units on the MNIST dataset... We train Potts models on 2 large proteins: OPSD BOVIN, and CADH1 HUMAN... We train deep EBMs parameterized by Residual Networks... on small binary and continuous image datasets... Static MNIST, Dynamic MNIST, Omniglot, Caltech Silhouettes, Frey Faces, Histopathology.
Dataset Splits No The paper mentions using a 'test-set' for evaluation (Table 2) and states that 'Full experimental details can be found in Appendix I', but it does not explicitly describe training, validation, and test dataset splits with specific percentages, counts, or a clear methodology in the main text.
Hardware Specification No The paper discusses computational efficiency and cost (e.g., 'the run-time of GWG is most comparable to Gibbs-2') and mentions general 'compute budget' and 'GPU' in the context of deep EBMs, but it does not specify exact hardware components such as GPU models (e.g., NVIDIA A100), CPU models, or memory details used for experiments.
Software Dependencies No The paper mentions 'Tensorflow Probability' in a footnote, but does not provide a specific version number. No other key software components are listed with their version numbers.
Experiment Setup No The paper describes some training aspects (e.g., 'using contrastive divergence', 'PCD with a replay buffer') but defers 'Full experimental details' to appendices (e.g., 'Full experimental details can be found in Appendix I'). It does not provide specific hyperparameter values like learning rates, batch sizes, or epochs in the main text.