Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Authors: Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander Miller, Arthur Szlam, Douwe Kiela, Jason Weston
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we set up variants of MTD along with a baseline of static data collection on Mechanical Turk. We show that for agents, either parameterized as standard Seq2Seq models with attention (Sutskever et al., 2014; Bahdanau et al., 2014), or as Action-Centric Seq2Seq (AC-Seq2Seq) models specially designed to take advantage of Graph World s structure, learning with MTD outperforms static training. |
| Researcher Affiliation | Industry | Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller Arthur Szlam, Douwe Kiela & Jason Weston Facebook AI Research |
| Pseudocode | No | The paper describes the MTD algorithm steps in plain text and with a diagram (Figure 1), but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | We employ the environment and MTD settings described in Section 4.1, code and data for which will be made available online.3. (...) 3https://github.com/facebookresearch/Parl AI/tree/master/projects/mastering_the_dungeon |
| Open Datasets | Yes | We employ the environment and MTD settings described in Section 4.1, code and data for which will be made available online.3. (...) 3https://github.com/facebookresearch/Parl AI/tree/master/projects/mastering_the_dungeon |
| Dataset Splits | No | The paper describes training on Dtrain all and evaluating on Dtest all (which includes components like Dtest cur and parts of other Turkers' data), and later on a combined held-out test set. It does not explicitly define a separate 'validation' or 'dev' split with specific percentages or counts for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models (e.g., NVIDIA A100, Tesla V100) or CPU models (e.g., Intel Xeon, AMD Ryzen) used for running the experiments. |
| Software Dependencies | No | The paper mentions general model architectures like Seq2Seq and GRU, and refers to the ParlAI framework, but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | We employ 30 Turkers on each round, and consider two settings: (i) ask them to create 10 examples each round; or (ii) ask them to create at least 10 examples each round (but they can create more) with a maximum time of 40 minutes. The length of the action sequence is constrained to be at most 4. For all the results in this section, we train the agents for 10 runs and report the mean and standard deviation. |