HiPPO: Recurrent Memory with Optimal Polynomial Projections
Authors: Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Ré
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Empirical Validation The Hi PPO dynamics are simple recurrences that can be easily incorporated into various models. We validate three claims that suggest that when incorporated into a simple RNN, these methods especially Hi PPO-Leg S yield a recurrent architecture with improved memory capability. In Section 4.1, the Hi PPO-Leg S RNN outperforms other RNN approaches in benchmark long-term dependency tasks for RNNs. Section 4.2 shows that Hi PPO-Leg S RNN is much more robust to timescale shifts compared to other RNN and neural ODE models. Section 4.3 validates the distinct theoretical advantages of the Hi PPO-Leg S memory mechanism, allowing fast and accurate online function reconstruction over millions of time steps. |
| Researcher Affiliation | Academia | Department of Computer Science, Stanford University Department of Computer Science and Engineering, University at Buffalo, SUNY {albertgu,trid}@stanford.edu, ermon@cs.stanford.edu, atri@buffalo.edu, chrismre@cs.stanford.edu |
| Pseudocode | No | The paper includes mathematical equations for continuous and discrete time dynamics and an illustration of the framework, but it does not contain explicit pseudocode or algorithm blocks labeled as such. |
| Open Source Code | Yes | Code for reproducing our experiments is available at https://github.com/Hazy Research/hippo-code. |
| Open Datasets | Yes | On the benchmark permuted MNIST dataset, our hyperparameter-free Hi PPO-Leg S method achieves a new state-of-the-art accuracy of 98.3% |
| Dataset Splits | No | Table 1 shows "Val. acc. (%)" for the p MNIST task, indicating a validation set was used. However, the main text does not explicitly state the specific percentages or sample counts for the training, validation, and test splits needed to reproduce the data partitioning. |
| Hardware Specification | No | Section 4.3 mentions that the Hi PPO-Leg S operator can perform updates "on a single CPU core," but it does not specify the model or type of CPU, nor does it provide details about other hardware components like GPUs used for training. |
| Software Dependencies | No | The paper states, "We implement the fast update in C++ with Pytorch binding." However, it does not provide specific version numbers for PyTorch, C++, or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | No | The paper describes its model architecture by stating that "All methods have the same hidden size in our experiments" and that "Hi PPO variants tie the memory size N to the hidden state dimension d." It also mentions a "classification head trained with cross-entropy." However, specific numerical values for hyperparameters such as the hidden size, learning rate, batch size, or optimizer settings are not explicitly provided in the main text; Appendix F.1 is referred to for "full architecture." |