MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations
Authors: Yao Dou, Maxwell Forbes, Ari Holtzman, Yejin Choi12760-12767
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments and Results |
| Researcher Affiliation | Collaboration | 1 University of Washington, 2 Allen Institute for AI |
| Pseudocode | No | The paper describes algorithms and models conceptually but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We collect and release1 a large dataset of highly-branching written conversations. 1https://uwnlp.github.io/multitalk/' but this link is specified for the dataset and not explicitly for the open-source code of the methodology itself. |
| Open Datasets | Yes | We collect and release1 a large dataset of highly-branching written conversations. The dataset contains 320,804 individual responses in a conversation tree. [...] 1https://uwnlp.github.io/multitalk/ |
| Dataset Splits | No | The paper mentions a 'validation set' in the context of an 'oracle' baseline (Table 6) but does not explicitly provide specific dataset split information (percentages, sample counts, or detailed methodology) for train/validation/test splits needed for reproduction. |
| Hardware Specification | No | The paper mentions 'available resources' for training but does not provide specific hardware details such as GPU/CPU models or memory amounts. |
| Software Dependencies | No | The paper mentions several software components like BERT-Large, GPT-2, SciPy, and GloVe, but it does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | At inference time, we sample from all models using top-p sampling with p = 0.9 (Holtzman et al. 2019). [...] To prevent biasing the language model to utterances higher in the dialog tree, we compute loss for the model only for tokens in the final utterance. [...] for the theory of mind task, and set γ = 0 to account only for the emotion of a response s immediate children. [...] We fine-tune GPT-2 M (345M params.). |