Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding
Authors: Nan Rosemary Ke, Anirudh Goyal ALIAS PARTH GOYAL, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly longterm dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention. |
| Researcher Affiliation | Collaboration | Nan Rosemary Ke1,2, Anirudh Goyal1, Olexa Bilaniuk1, Jonathan Binas1, Michael C. Mozer3, Chris Pal1,2,4, Yoshua Bengio1 1 Mila, Université de Montréal 2 Mila, Polytechnique Montréal 3 University of Colorado, Boulder 4 Element AI CIFAR Senior Fellow. |
| Pseudocode | Yes | Algorithm 1 SAB-augmented LSTM |
| Open Source Code | No | The paper does not include an unambiguous statement about releasing source code, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Copying and adding problems defined in Hochreiter & Schmidhuber (1997), Character level Penn Tree Bank (PTB) (Q1) We follow the setup in Cooijmans et al. (2016), Text8 (Q1) We follow the setup of Mikolov et al. (2012); we use the first 90M characters for training, the next 5M for validation and the final 5M characters for testing., Permuted pixel-by-pixel MNIST (Q1) This task is a sequential version of the MNIST classification dataset., CIFAR10 classification (Q1,Q3) We test our model s performance on pixel-by-pixel CIFAR10 (no permutation). |
| Dataset Splits | Yes | We use the first 90M characters for training, the next 5M for validation and the final 5M characters for testing. |
| Hardware Specification | No | The paper mentions "Compute Canada and NVIDIA for computing resources" in the acknowledgements, but does not specify particular hardware details like GPU/CPU models, memory, or specific machine configurations used for running experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and Theano (a deep learning library), but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | All models have 128 hidden units and use the Adam Kingma & Ba (2014) optimizer with a learning rate of 1e-3. |