Traveling Waves Encode The Recent Past and Enhance Sequence Learning
Authors: T. Anderson Keller, Lyle Muller, Terrence Sejnowski, Max Welling
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we aim to leverage the model introduced in Section 2 to test the computational hypothesis that traveling waves may serve as a mechanism to encode the recent past in a wavefield short-term memory. To do this, we first leverage a suite of frequently used synthetic memory tasks designed to precisely measure the ability of sequence models to store information and learn dependencies over variable length timescales. Following this, we use a suite of standard sequence modeling benchmarks to measure if the demonstrated short-term memory benefits of w RNNs persist in a more complex regime. For each task we perform a grid search over learning rates, learning rate schedules, and gradient clip magnitudes, presenting the best performing models from each category on a held-out validation set in the figures and tables. |
| Researcher Affiliation | Academia | T. Anderson Keller The Kempner Institute for the Study of Natural and Artificial Intelligence Harvard University, USA Lyle Muller Department of Mathematics Western University, CA Terrence Sejnowski Computational Neurobiology Lab Salk Institute for Biological Studies, USA Max Welling Amsterdam Machine Learning Lab University of Amsterdam, NL |
| Pseudocode | Yes | Pseudocode. Below we include an example implementation of the w RNN cell in Pytorch (Paszke et al., 2019): |
| Open Source Code | Yes | All code for reproducing the results can be found at the following repository: https://github.com/akandykeller/Wave_RNNs. |
| Open Datasets | Yes | In this work we specifically experiment with three sequential image tasks: sequential MNIST (s MNIST), permuted sequential MNIST (ps MNIST), and noisy sequential CIFAR10 (ns CIFAR10). |
| Dataset Splits | Yes | For each task we perform a grid search over learning rates, learning rate schedules, and gradient clip magnitudes, presenting the best performing models from each category on a held-out validation set in the figures and tables. |
| Hardware Specification | Yes | The total training time for these sweeps was roughly 1,900 GPU hours, with models being trained on individual NVIDIA 1080Ti GPUs. |
| Software Dependencies | No | The paper mentions 'PyTorch (Paszke et al., 2019)' and 'Weight & Biases (Biewald, 2020)' but does not provide specific version numbers (e.g., PyTorch 1.9). |
| Experiment Setup | Yes | For each task we perform a grid search over learning rates, learning rate schedules, and gradient clip magnitudes, presenting the best performing models from each category on a held-out validation set in the figures and tables. In Appendix B we include the full ranges of each grid search as well as exact hyperparameters for the best performing models in each category. |