reproducibilityindex.ai

Computational Language Acquisition with Theory of Mind

Authors: Andy Liu, Hao Zhu, Emmy Liu, Yonatan Bisk, Graham Neubig

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We build language-learning agents equipped with To M, and measure its effects on the learning process. We experiment with varying task difficulty, hypothesizing that models will acquire more complex language to adapt to stronger environmental pressures. We find that training speakers with a highly weighted To M listener component leads to performance gains in our image referential game setting. We also find some evidence that increasing task difficulty in the training process results in more fluent and precise utterances in evaluation. ... In experiments, we find that (RQ1) speaker models including To M components generally outperform those that do not in terms of fluency and final accuracy. We also find that (RQ2) training with more visually and semantically similar distractor referents causes the speaker model to develop longer, more fluent, and more precise utterances in evaluation.
Researcher Affiliation	Academia	Andy Liu Harvey Mudd College Claremont, CA, USA {ajliu}@g.hmc.edu; Hao Zhu, Emmy Liu, Yonatan Bisk, Graham Neubig Language Technologies Institute Carnegie Mellon University Pittsburgh, PA, USA {zhuhao, mengyan3, ybisk, gneubig}@cs.}cmu.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks. It provides mathematical equations for model components and training objectives.
Open Source Code	Yes	Code and data can be found at https://github.com/neulab/To M-Language-Acquisition.
Open Datasets	Yes	Images and captions are drawn from the MS COCO dataset introduced in Lin et al. (2014).
Dataset Splits	Yes	The train-val-test split given by the MS COCO 2017 dataset is extended to our experimental setup.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. It only refers to general model architectures like Res Net and LSTM.
Software Dependencies	No	The paper mentions several software components and models (e.g., Res Net, LSTM, PPO, GPT-2 large, spaCy, CLIP, RoBERTa, sentence-transformers) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	The speaker generates a sequence of tokens to either the maximum length, which is set to 20 in our experiments, or an end-of-sequence token. In our experiments, we use a vocabulary size of 200... We introduce two thresholds, θ1 and θ2. ... We train models with three different settings of wl. We train models with wl = 0... wl = 1... Finally, we train models where wl is the arbitrarily high constant 1000. ... σ is set to decay linearly over time...