Learning by Directional Gradient Descent
Authors: David Silver, Anirudh Goyal, Ivo Danihelka, Matteo Hessel, Hado van Hasselt
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS We now report and discuss the results of an empirical study that analyses the performance of the proposed estimator using different tasks, as well as using different ways to approximate the expected gradient. We use JAX (Bradbury et al., 2018) to implement all experiments. |
| Researcher Affiliation | Collaboration | 01 Deep Mind, London, UK, 2 University College London, 3 Mila, University of Montreal. |
| Pseudocode | Yes | Listing 1: DODGE implemented in JAX. |
| Open Source Code | No | The paper provides an example implementation in Listing 1 and states, "We use JAX (Bradbury et al., 2018) to implement all experiments." However, it does not explicitly provide a link to the authors' full source code for the methodology or state that their code is being released. |
| Open Datasets | Yes | We evaluate the proposed DODGE update on different problems. We first give a brief description of the different problems... Copy task. The copy problem defined in Graves et al. (2014)... MNIST classification task. It is a database of handwritten digits (Le Cun, 1998)... Influence Balancing task. This task was introduced by Tallec & Ollivier (2017)... Image regression Ne RF task. This task trains the initial parameters of a 2D-Ne RF model (Mildenhall et al., 2020)... We build upon the experimental setup proposed by Tancik et al. (2021). |
| Dataset Splits | No | The paper states, "For each method, we choose the best learning rate from {0.003, 0.001, 0.0003, 0.0001, 0.00003, 0.00001}, based on the final performance." This implies some form of validation for hyperparameter tuning, but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing for any of the datasets used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. It only mentions using JAX for implementation. |
| Software Dependencies | No | The paper mentions using "JAX (Bradbury et al., 2018)" and the "Adam optimizer (Kingma & Ba, 2014)", and an "LSTM network (Hochreiter & Schmidhuber, 1997)". However, it does not specify version numbers for these software components or libraries, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | On sequence modeling tasks, we use an LSTM network (Hochreiter & Schmidhuber, 1997) with 128 units and a batch size of 32. We optimize the log-likelihood using the Adam optimizer (Kingma & Ba, 2014). For each method, we choose the best learning rate from {0.003, 0.001, 0.0003, 0.0001, 0.00003, 0.00001}, based on the final performance. We repeat each experiment 5 times with 5 different random seeds. |