Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Authors: Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, Hi SS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that Hi SS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, Pittsburgh, USA 2FAIR, Meta 3New York University, NYC, USA. |
| Pseudocode | No | The paper describes models and architectures in text and diagrams (e.g., Figure 4) but does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Code, datasets and videos can be found on https://hiss-csp.github.io |
| Open Datasets | Yes | We release CSP-Bench, the largest publicly accessible benchmark for continuous sequence-to-sequence prediction for multiple sensor datasets. ... Code, datasets and videos can be found on https://hiss-csp.github.io |
| Dataset Splits | Yes | For all tactile datasets and VECtor, we use an 80-20 train-validation split. For the Ro NIN dataset, we use the first four minutes of every trajectory for our analysis, and use a validation set consisting of trajectories from unseen subjects. For Total Capture, we use the train-validation split proposed by Trumble et al. (2017). |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used for the experiments. |
| Experiment Setup | Yes | All our models are trained end-to-end to minimize MSE loss as explained in Section 3.1. ... All models are trained for 600 epochs at a constant learning rate of 1e-3. ... Hyperparameter sweep ranges for each of our models and baselines, along with the resulting range of parameter counts are listed in Appendix B. ... Table 5. Hyperparameters for flat architectures ... Table 6. Hyperparameters for low-level models used in hierarchical architectures |