Gradient Estimation with Stochastic Softmax Tricks

Authors: Max Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J. Maddison

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our goal in these experiments was to evaluate the use of SSTs for learning distributions over structured latent spaces in deep structured models.
Researcher Affiliation Collaboration Max B. Paulus ETH Zürich max.paulus@inf.ethz.ch Dami Choi University of Toronto choidami@cs.toronto.edu Daniel Tarlow Google Research, Brain Team dtarlow@google.com Andreas Krause ETH Zürich krausea@ethz.ch Chris J. Maddison University of Toronto & Deep Mind cmaddis@cs.toronto.edu
Pseudocode No The paper describes methods and algorithms textually but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/choidami/sst.
Open Datasets Yes We used a simplified variant of the List Ops dataset [62], which contains sequences of prefix arithmetic expressions, e.g., max[ 3 min[ 8 2 ]], that evaluate to an integer in [0, 9].
Dataset Splits No We selected models on a validation set according to the best objective value obtained during training. All reported values are measured on a test set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions software packages like TensorFlow, PyTorch, and JAX as modern software packages used, but it does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup Yes We optimized hyperparameters (including fixed training temperature t) using random search over multiple independent runs. We selected models on a validation set according to the best objective value obtained during training.