Oops I Took A Gradient: Scalable Sampling for Discrete Distributions
Authors: Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris Maddison
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. |
| Researcher Affiliation | Collaboration | 1University of Toronto and Vector Institute 2Google Research, Brain Team. |
| Pseudocode | Yes | Algorithm 1 Gibbs With Gradients |
| Open Source Code | Yes | We describe some simple extensions in Appendix D and code to replicate our experiments is available here. |
| Open Datasets | Yes | We train an RBM with 500 hidden units on the MNIST dataset... We train Potts models on 2 large proteins: OPSD BOVIN, and CADH1 HUMAN... We train deep EBMs parameterized by Residual Networks... on small binary and continuous image datasets... Static MNIST, Dynamic MNIST, Omniglot, Caltech Silhouettes, Frey Faces, Histopathology. |
| Dataset Splits | No | The paper mentions using a 'test-set' for evaluation (Table 2) and states that 'Full experimental details can be found in Appendix I', but it does not explicitly describe training, validation, and test dataset splits with specific percentages, counts, or a clear methodology in the main text. |
| Hardware Specification | No | The paper discusses computational efficiency and cost (e.g., 'the run-time of GWG is most comparable to Gibbs-2') and mentions general 'compute budget' and 'GPU' in the context of deep EBMs, but it does not specify exact hardware components such as GPU models (e.g., NVIDIA A100), CPU models, or memory details used for experiments. |
| Software Dependencies | No | The paper mentions 'Tensorflow Probability' in a footnote, but does not provide a specific version number. No other key software components are listed with their version numbers. |
| Experiment Setup | No | The paper describes some training aspects (e.g., 'using contrastive divergence', 'PCD with a replay buffer') but defers 'Full experimental details' to appendices (e.g., 'Full experimental details can be found in Appendix I'). It does not provide specific hyperparameter values like learning rates, batch sizes, or epochs in the main text. |