reproducibilityindex.ai

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Authors: Nan Rosemary Ke, Anirudh Goyal ALIAS PARTH GOYAL, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly longterm dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences signiﬁcantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.
Researcher Affiliation	Collaboration	Nan Rosemary Ke1,2, Anirudh Goyal1, Olexa Bilaniuk1, Jonathan Binas1, Michael C. Mozer3, Chris Pal1,2,4, Yoshua Bengio1 1 Mila, Université de Montréal 2 Mila, Polytechnique Montréal 3 University of Colorado, Boulder 4 Element AI CIFAR Senior Fellow.
Pseudocode	Yes	Algorithm 1 SAB-augmented LSTM
Open Source Code	No	The paper does not include an unambiguous statement about releasing source code, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Copying and adding problems deﬁned in Hochreiter & Schmidhuber (1997), Character level Penn Tree Bank (PTB) (Q1) We follow the setup in Cooijmans et al. (2016), Text8 (Q1) We follow the setup of Mikolov et al. (2012); we use the ﬁrst 90M characters for training, the next 5M for validation and the ﬁnal 5M characters for testing., Permuted pixel-by-pixel MNIST (Q1) This task is a sequential version of the MNIST classiﬁcation dataset., CIFAR10 classiﬁcation (Q1,Q3) We test our model s performance on pixel-by-pixel CIFAR10 (no permutation).
Dataset Splits	Yes	We use the ﬁrst 90M characters for training, the next 5M for validation and the ﬁnal 5M characters for testing.
Hardware Specification	No	The paper mentions "Compute Canada and NVIDIA for computing resources" in the acknowledgements, but does not specify particular hardware details like GPU/CPU models, memory, or specific machine configurations used for running experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and Theano (a deep learning library), but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	All models have 128 hidden units and use the Adam Kingma & Ba (2014) optimizer with a learning rate of 1e-3.