Learning Object-Oriented Dynamics for Planning from Text

Authors: Guiliang Liu, Ashutosh Adhikari, Amir-massoud Farahmand, Pascal Poupart

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our OOTD-based planner significantly outperforms model-free baselines in terms of sample efficiency and running scores.
Researcher Affiliation Collaboration Guiliang Liu1,3, Ashutosh Adhikari1,4 , Amir-massoud Farahmand2,3, Pascal Poupart1,3 1University of Waterloo, 2University of Toronto, 3Vector Institute, 4Microsoft
Pseudocode Yes Algorithm 1: Dyna-Q Training Input ... (Appendix B) Algorithm 2: MCTS Testing Input ... (Appendix B)
Open Source Code Yes Code and Dataset. Please find our implementation on Github1. 1https://github.com/Guiliang/OORL-public
Open Datasets Yes Environments. We divide the games in the Text-World benchmark (Côté et al., 2018) into five subsets according to their difficulty levels. Each subset contains 100 training, 20 validation, and 20 testing games. ... The FTWP dataset is a public dataset that supports pre-training the dynamics model 2. Trischler et al. (2019) provided the First Text World Problems (FTWP) dataset.
Dataset Splits Yes Environments. We divide the games in the Text-World benchmark (Côté et al., 2018) into five subsets according to their difficulty levels. Each subset contains 100 training, 20 validation, and 20 testing games. ... The model hyper-parameters are tuned with the games in the validation set.
Hardware Specification Yes The cluster has multiple kinds of GPUs, including Tesla T4 with 16 GB memory, Tesla P100 with 12 GB memory, and RTX 6000 with 24 GB memory. We used machines with 24 GB of memory for pre-training the object extractor and 64 GB for training the OOTD model.
Software Dependencies No The paper mentions software components like "transformers" but does not specify exact version numbers for any libraries, frameworks, or operating systems used in the experiments.
Experiment Setup Yes We set the number of candidate objects K to 99 and the number of candidate relations to 10... The learning rate of policy training and dynamics model training is set to 0.0001. The discount factor γ is set to 0.9. The size of the hidden layers and the size of z in our OOTD model are set to 32. The sizes of word embedding and node embedding are set to 300 and 100 respectively. ... The random seeds we select are 123, 321, and 666.