reproducibilityindex.ai

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Authors: Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander Miller, Arthur Szlam, Douwe Kiela, Jason Weston

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we set up variants of MTD along with a baseline of static data collection on Mechanical Turk. We show that for agents, either parameterized as standard Seq2Seq models with attention (Sutskever et al., 2014; Bahdanau et al., 2014), or as Action-Centric Seq2Seq (AC-Seq2Seq) models specially designed to take advantage of Graph World s structure, learning with MTD outperforms static training.
Researcher Affiliation	Industry	Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller Arthur Szlam, Douwe Kiela & Jason Weston Facebook AI Research
Pseudocode	No	The paper describes the MTD algorithm steps in plain text and with a diagram (Figure 1), but does not include structured pseudocode or an algorithm block.
Open Source Code	Yes	We employ the environment and MTD settings described in Section 4.1, code and data for which will be made available online.3. (...) 3https://github.com/facebookresearch/Parl AI/tree/master/projects/mastering_the_dungeon
Open Datasets	Yes	We employ the environment and MTD settings described in Section 4.1, code and data for which will be made available online.3. (...) 3https://github.com/facebookresearch/Parl AI/tree/master/projects/mastering_the_dungeon
Dataset Splits	No	The paper describes training on Dtrain all and evaluating on Dtest all (which includes components like Dtest cur and parts of other Turkers' data), and later on a combined held-out test set. It does not explicitly define a separate 'validation' or 'dev' split with specific percentages or counts for hyperparameter tuning.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models (e.g., NVIDIA A100, Tesla V100) or CPU models (e.g., Intel Xeon, AMD Ryzen) used for running the experiments.
Software Dependencies	No	The paper mentions general model architectures like Seq2Seq and GRU, and refers to the ParlAI framework, but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	We employ 30 Turkers on each round, and consider two settings: (i) ask them to create 10 examples each round; or (ii) ask them to create at least 10 examples each round (but they can create more) with a maximum time of 40 minutes. The length of the action sequence is constrained to be at most 4. For all the results in this section, we train the agents for 10 runs and report the mean and standard deviation.