Facing Off World Model Backbones: RNNs, Transformers, and S4
Authors: Fei Deng, Junyeong Park, Sungjin Ahn
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we extensively compare RNN-, Transformer-, and S4-based world models across four sets of environments, which we have tailored to assess crucial memory capabilities of world models, including long-term imagination, context-dependent recall, reward prediction, and memory-based reasoning. |
| Researcher Affiliation | Academia | Fei Deng Rutgers University fei.deng@rutgers.edu; Junyeong Park KAIST jyp10987@kaist.ac.kr; Sungjin Ahn KAIST sungjin.ahn@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 S4WM Training; Algorithm 2 S4WM Imagination |
| Open Source Code | No | https://fdeng18.github.io/s4wm is provided as a project page, but the paper does not contain an explicit statement like 'We release our code' nor a direct link to a code repository for the described methodology. |
| Open Datasets | No | For each 3D environment (i.e., Two Rooms, Four Rooms, and Ten Rooms), we generate 30K trajectories using a scripted policy... For each 2D environment (i.e., Distracting Memory and Multi Doors Keys), we generate 10K trajectories using a scripted policy... The paper states it generated its own datasets from environments but does not provide access to these collected datasets. |
| Dataset Splits | Yes | For each 3D environment (i.e., Two Rooms, Four Rooms, and Ten Rooms), we generate 30K trajectories using a scripted policy, of which 28K are used for training, 1K for validation, and 1K for testing. For each 2D environment (i.e., Distracting Memory and Multi Doors Keys), we generate 10K trajectories using a scripted policy, of which 8K are used for training, 1K for validation, and 1K for testing. |
| Hardware Specification | Yes | All results in Figure 3 are obtained on a single NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions using Adam W optimizer and Si LU nonlinearity, and states that the implementation is based on S4 [21] and Dreamer V3 [31] code, but it does not specify version numbers for any software components or libraries. |
| Experiment Setup | Yes | Hyperparameters and further implementation details can be found in Appendix J. We provide the hyperparameters used for 3D and 2D environments in Tables 10 and 11 respectively. These tables list specific values for optimizer, batch size, learning rate, weight decay, gradient clipping, and various model architectural parameters. |