Efficient Dynamics Modeling in Interactive Environments with Koopman Theory
Authors: Arnab Kumar Mondal, Siba Smarak Panigrahi, Sai Rajeswar, Kaleem Siddiqi, Siamak Ravanbakhsh
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results in offline-RL datasets demonstrate the effectiveness of our approach for reward and state prediction over a long horizon. |
| Researcher Affiliation | Collaboration | Arnab Kumar Mondal Mila, Mc Gill University Siba Smarak Panigrahi Mila, Mc Gill University Sai Rajeswar Service Now Research Kaleem Siddiqi Mila, Mc Gill University Siamak Ravanbakhsh Mila, Mc Gill University |
| Pseudocode | Yes | Algorithm 1 Diagonal Koopman Dynamics model |
| Open Source Code | Yes | Our code can be found in https://github.com/arnab39/koopman-dynamica. |
| Open Datasets | Yes | For the forward dynamics modeling experiments, we use the D4RL Fu et al. (2020) dataset, which is a popular offline-RL environment. |
| Dataset Splits | No | The paper states 'We divide the dataset of 1M samples into 80:20 splits for training and testing, respectively.', but does not mention a separate validation split or how it's handled if implicitly used (e.g., during hyperparameter tuning). |
| Hardware Specification | Yes | Each iteration consists of one gradient update of the entire model using a mini-batch of 256 in A100 GPU. |
| Software Dependencies | No | The paper provides code snippets in Jax and Flax (Appendix J) but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To train the dynamics model, we randomly sample trajectories of length τ from the training data, where τ is the horizon specified during training. We test our learned dynamics model for a horizon length of 100 by randomly sampling 50,000 trajectories of length 100 from the test set. |