The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Authors: Chris J. Maddison, Andriy Mnih, Yee Whye Teh
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks. In Section 5 we present results on a density estimation task and a structured prediction task on the MNIST and Omniglot datasets. |
| Researcher Affiliation | Collaboration | Chris J. Maddison1,2, Andriy Mnih1, & Yee Whye Teh1 1Deep Mind, London, United Kingdom 2University of Oxford, Oxford, United Kingdom |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link for the availability of its source code. |
| Open Datasets | Yes | We performed the experiments using the MNIST and Omniglot datasets. For MNIST we used the fixed binarization of Salakhutdinov & Murray (2008)... For Omniglot we sampled a fixed binarization and used the standard 24,345/8,070 split into training/testing sets. |
| Dataset Splits | Yes | For MNIST we used the fixed binarization of Salakhutdinov & Murray (2008) and the standard 50,000/10,000/10,000 split into training/validation/testing sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'automatic differentiation (AD) libraries' and cites 'Abadi et al., 2015' (TensorFlow) and 'Theano Development Team, 2016' (Theano), but does not provide specific version numbers for these or other software dependencies used in their experiments. |
| Experiment Setup | Yes | All models were initialized with the heuristic of Glorot & Bengio (2010) and optimized using Adam (Kingma & Ba, 2014) with parameters β1 = 0.9, β2 = 0.999 for 10^7 steps on minibatches of size 64. ... Learning rates were chosen from {10^−4, 3 × 10^−4, 10^−3} and weight decay from {0, 10^−2, 10^−1, 1}. ... For density estimation, the Concrete relaxation hyperparameters were (weight decay = 0, learning rate = 3 × 10^−4) for linear and (weight decay = 0, learning rate = 10^−4) for non-linear. |