Kronecker-factored Curvature Approximations for Recurrent Neural Networks
Authors: James Martens, Jimmy Ba, Matt Johnson
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate in experiments that our method significantly outperforms general purpose state-of-the-art optimizers like SGD with momentum and Adam on several challenging RNN training tasks. |
| Researcher Affiliation | Collaboration | James Martens Deep Mind jamesmartens@google.com Jimmy Ba Department of Computer Science University of Toronto Toronto, Canada jimmy@psi.toronto.edu Matthew Johnson Google Brain mattjj@google.com |
| Pseudocode | Yes | Full pseudo-code for the resulting algorithms is given in Section C.3. |
| Open Source Code | No | The paper does not provide a direct link to its source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | word-level language modeling task on the Penn-Tree Bank (PTB) dataset (Marcus et al., 1993) |
| Dataset Splits | Yes | following the experimental setup in Zaremba et al. (2014). We employ the same data partition in Mikolov et al. (2012). |
| Hardware Specification | Yes | We used a single machine with 16 CPU cores and a Nvidia K40 GPU for all the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The truncation length used in the experiments is 35 timesteps. The learning rate is given by a carefully tuned decaying schedule (whose base value we tune along with the other hyperparamters). All the methods used a mini-batch size of 200. |