Kronecker-factored Curvature Approximations for Recurrent Neural Networks

Authors: James Martens, Jimmy Ba, Matt Johnson

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate in experiments that our method significantly outperforms general purpose state-of-the-art optimizers like SGD with momentum and Adam on several challenging RNN training tasks.
Researcher Affiliation Collaboration James Martens Deep Mind jamesmartens@google.com Jimmy Ba Department of Computer Science University of Toronto Toronto, Canada jimmy@psi.toronto.edu Matthew Johnson Google Brain mattjj@google.com
Pseudocode Yes Full pseudo-code for the resulting algorithms is given in Section C.3.
Open Source Code No The paper does not provide a direct link to its source code or explicitly state that the code is publicly available.
Open Datasets Yes word-level language modeling task on the Penn-Tree Bank (PTB) dataset (Marcus et al., 1993)
Dataset Splits Yes following the experimental setup in Zaremba et al. (2014). We employ the same data partition in Mikolov et al. (2012).
Hardware Specification Yes We used a single machine with 16 CPU cores and a Nvidia K40 GPU for all the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes The truncation length used in the experiments is 35 timesteps. The learning rate is given by a carefully tuned decaying schedule (whose base value we tune along with the other hyperparamters). All the methods used a mini-batch size of 200.