Learning Object-Oriented Dynamics for Planning from Text
Authors: Guiliang Liu, Ashutosh Adhikari, Amir-massoud Farahmand, Pascal Poupart
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our OOTD-based planner significantly outperforms model-free baselines in terms of sample efficiency and running scores. |
| Researcher Affiliation | Collaboration | Guiliang Liu1,3, Ashutosh Adhikari1,4 , Amir-massoud Farahmand2,3, Pascal Poupart1,3 1University of Waterloo, 2University of Toronto, 3Vector Institute, 4Microsoft |
| Pseudocode | Yes | Algorithm 1: Dyna-Q Training Input ... (Appendix B) Algorithm 2: MCTS Testing Input ... (Appendix B) |
| Open Source Code | Yes | Code and Dataset. Please find our implementation on Github1. 1https://github.com/Guiliang/OORL-public |
| Open Datasets | Yes | Environments. We divide the games in the Text-World benchmark (Côté et al., 2018) into five subsets according to their difficulty levels. Each subset contains 100 training, 20 validation, and 20 testing games. ... The FTWP dataset is a public dataset that supports pre-training the dynamics model 2. Trischler et al. (2019) provided the First Text World Problems (FTWP) dataset. |
| Dataset Splits | Yes | Environments. We divide the games in the Text-World benchmark (Côté et al., 2018) into five subsets according to their difficulty levels. Each subset contains 100 training, 20 validation, and 20 testing games. ... The model hyper-parameters are tuned with the games in the validation set. |
| Hardware Specification | Yes | The cluster has multiple kinds of GPUs, including Tesla T4 with 16 GB memory, Tesla P100 with 12 GB memory, and RTX 6000 with 24 GB memory. We used machines with 24 GB of memory for pre-training the object extractor and 64 GB for training the OOTD model. |
| Software Dependencies | No | The paper mentions software components like "transformers" but does not specify exact version numbers for any libraries, frameworks, or operating systems used in the experiments. |
| Experiment Setup | Yes | We set the number of candidate objects K to 99 and the number of candidate relations to 10... The learning rate of policy training and dynamics model training is set to 0.0001. The discount factor γ is set to 0.9. The size of the hidden layers and the size of z in our OOTD model are set to 32. The sizes of word embedding and node embedding are set to 300 and 100 respectively. ... The random seeds we select are 123, 321, and 666. |