reproducibilityindex.ai

Learning Object-Oriented Dynamics for Planning from Text

Authors: Guiliang Liu, Ashutosh Adhikari, Amir-massoud Farahmand, Pascal Poupart

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that our OOTD-based planner signiﬁcantly outperforms model-free baselines in terms of sample efﬁciency and running scores.
Researcher Affiliation	Collaboration	Guiliang Liu1,3, Ashutosh Adhikari1,4 , Amir-massoud Farahmand2,3, Pascal Poupart1,3 1University of Waterloo, 2University of Toronto, 3Vector Institute, 4Microsoft
Pseudocode	Yes	Algorithm 1: Dyna-Q Training Input ... (Appendix B) Algorithm 2: MCTS Testing Input ... (Appendix B)
Open Source Code	Yes	Code and Dataset. Please ﬁnd our implementation on Github1. 1https://github.com/Guiliang/OORL-public
Open Datasets	Yes	Environments. We divide the games in the Text-World benchmark (Côté et al., 2018) into ﬁve subsets according to their difﬁculty levels. Each subset contains 100 training, 20 validation, and 20 testing games. ... The FTWP dataset is a public dataset that supports pre-training the dynamics model 2. Trischler et al. (2019) provided the First Text World Problems (FTWP) dataset.
Dataset Splits	Yes	Environments. We divide the games in the Text-World benchmark (Côté et al., 2018) into ﬁve subsets according to their difﬁculty levels. Each subset contains 100 training, 20 validation, and 20 testing games. ... The model hyper-parameters are tuned with the games in the validation set.
Hardware Specification	Yes	The cluster has multiple kinds of GPUs, including Tesla T4 with 16 GB memory, Tesla P100 with 12 GB memory, and RTX 6000 with 24 GB memory. We used machines with 24 GB of memory for pre-training the object extractor and 64 GB for training the OOTD model.
Software Dependencies	No	The paper mentions software components like "transformers" but does not specify exact version numbers for any libraries, frameworks, or operating systems used in the experiments.
Experiment Setup	Yes	We set the number of candidate objects K to 99 and the number of candidate relations to 10... The learning rate of policy training and dynamics model training is set to 0.0001. The discount factor γ is set to 0.9. The size of the hidden layers and the size of z in our OOTD model are set to 32. The sizes of word embedding and node embedding are set to 300 and 100 respectively. ... The random seeds we select are 123, 321, and 666.