Simplifying Latent Dynamics with Softly State-Invariant World Models
Authors: Tankred Saanum, Peter Dayan, Eric Schulz
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the PLSM s effect on planning algorithms ability to learn policies in continuous control tasks. |
| Researcher Affiliation | Academia | Tankred Saanum1 Peter Dayan1,2 Eric Schulz1,3 1Max Planck Institute for Biological Cybernetics, 2University of Tübingen 3 Helmholtz Institute for Human-Centered AI, Helmholtz Center Munich, Neuherberg, Germany tankred.saanum@tuebingen.mpg.de |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | We will release code with model implementation upon publication. |
| Open Datasets | Yes | We evaluated the efficacy of parsimonious dynamics for control in five state-based continuous control tasks from the Deep Mind Control Suite (DMC) [11]. ... We trained the PLSM augmented SPR algorithm on 100k environment steps across 5 seeds on all 26 Atari games. ... Additionally we created an environment based on the d Sprite dataset [18]... Lastly, we evaluate PLSM on a dynamic object interaction dataset with realistic textures and physics without actions, MOVi-E [19] |
| Dataset Splits | No | No explicit statement detailing specific training/validation/test splits (e.g., percentages, sample counts, or explicit predefined splits) was found. |
| Hardware Specification | Yes | We ran all experiments reported in the paper on compute nodes with 2 Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions software components like PyTorch (implicitly via `torch` import), ReLU activation, and Adam optimizer, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Table 2: Contrastive model hyperparameters. Hidden units 512 Batch size 512 MLP hidden layers 2 Latent dimensions |zt| 50 Query dimensions |ht| 50 Regularization coefficient β 0.1 Margin λ 1 Learning rate 0.001 Activation function Re LU [50] Optimizer Adam [51] |