Traveling Waves Encode The Recent Past and Enhance Sequence Learning

Authors: T. Anderson Keller, Lyle Muller, Terrence Sejnowski, Max Welling

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we aim to leverage the model introduced in Section 2 to test the computational hypothesis that traveling waves may serve as a mechanism to encode the recent past in a wavefield short-term memory. To do this, we first leverage a suite of frequently used synthetic memory tasks designed to precisely measure the ability of sequence models to store information and learn dependencies over variable length timescales. Following this, we use a suite of standard sequence modeling benchmarks to measure if the demonstrated short-term memory benefits of w RNNs persist in a more complex regime. For each task we perform a grid search over learning rates, learning rate schedules, and gradient clip magnitudes, presenting the best performing models from each category on a held-out validation set in the figures and tables.
Researcher Affiliation Academia T. Anderson Keller The Kempner Institute for the Study of Natural and Artificial Intelligence Harvard University, USA Lyle Muller Department of Mathematics Western University, CA Terrence Sejnowski Computational Neurobiology Lab Salk Institute for Biological Studies, USA Max Welling Amsterdam Machine Learning Lab University of Amsterdam, NL
Pseudocode Yes Pseudocode. Below we include an example implementation of the w RNN cell in Pytorch (Paszke et al., 2019):
Open Source Code Yes All code for reproducing the results can be found at the following repository: https://github.com/akandykeller/Wave_RNNs.
Open Datasets Yes In this work we specifically experiment with three sequential image tasks: sequential MNIST (s MNIST), permuted sequential MNIST (ps MNIST), and noisy sequential CIFAR10 (ns CIFAR10).
Dataset Splits Yes For each task we perform a grid search over learning rates, learning rate schedules, and gradient clip magnitudes, presenting the best performing models from each category on a held-out validation set in the figures and tables.
Hardware Specification Yes The total training time for these sweeps was roughly 1,900 GPU hours, with models being trained on individual NVIDIA 1080Ti GPUs.
Software Dependencies No The paper mentions 'PyTorch (Paszke et al., 2019)' and 'Weight & Biases (Biewald, 2020)' but does not provide specific version numbers (e.g., PyTorch 1.9).
Experiment Setup Yes For each task we perform a grid search over learning rates, learning rate schedules, and gradient clip magnitudes, presenting the best performing models from each category on a held-out validation set in the figures and tables. In Appendix B we include the full ranges of each grid search as well as exact hyperparameters for the best performing models in each category.