Online learning of long-range dependencies

Authors: Nicolas Zucchet, Robert Meier, Simon Schug, Asier Mujika, Joao Sacramento

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on synthetic memory problems and on the challenging long-range arena benchmark suite reveal that our algorithm performs competitively, establishing a new standard for what can be achieved through online learning.
Researcher Affiliation Academia Department of Computer Science, ETH Zürich {nzucchet,romeier,sschug,asierm,rjoao}@ethz.ch
Pseudocode No The paper provides detailed mathematical derivations of the algorithm's updates in Appendix A.2, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper states 'We base our implementation on the S5 [22] code base4.' with footnote 4 linking to 'https://github.com/lindermanlab/S5'. This is a third-party codebase they used as a base, not an explicit release of their own code for the specific methodology presented in this paper.
Open Datasets Yes Finally, we evaluate our method on three tasks of the Long Range Arena [24]: a sequential version of CIFAR [41], LISTOPS [42] and IMDB [43]
Dataset Splits Yes For additional experimental details and hyperparameter configurations, we refer to Appendix B. (Appendix B contains Tables 4, 5, 6 which list hyperparameters like 'Training samples', 'Batch-size', 'Epochs', 'Warmup', implicitly defining the training process and usage of data for tuning/validation.)
Hardware Specification Yes The training time for our online learning rule on a single Nvidia RTX3090 GPU for SCIFAR, IMDB and LISTOPS was respectively 36, 10 and 40 hours.
Software Dependencies No The paper mentions using the 'Adam W optimizer' and basing their implementation on 'S5 [22] code base', but it does not specify version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup Yes For additional experimental details and hyperparameter configurations, we refer to Appendix B. (Tables 4, 5, 6 provide detailed hyperparameters such as 'Batch-size', 'Base learning rate', 'Epochs', 'Dropout probability', etc.)