Computational Language Acquisition with Theory of Mind
Authors: Andy Liu, Hao Zhu, Emmy Liu, Yonatan Bisk, Graham Neubig
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We build language-learning agents equipped with To M, and measure its effects on the learning process. We experiment with varying task difficulty, hypothesizing that models will acquire more complex language to adapt to stronger environmental pressures. We find that training speakers with a highly weighted To M listener component leads to performance gains in our image referential game setting. We also find some evidence that increasing task difficulty in the training process results in more fluent and precise utterances in evaluation. ... In experiments, we find that (RQ1) speaker models including To M components generally outperform those that do not in terms of fluency and final accuracy. We also find that (RQ2) training with more visually and semantically similar distractor referents causes the speaker model to develop longer, more fluent, and more precise utterances in evaluation. |
| Researcher Affiliation | Academia | Andy Liu Harvey Mudd College Claremont, CA, USA {ajliu}@g.hmc.edu; Hao Zhu, Emmy Liu, Yonatan Bisk, Graham Neubig Language Technologies Institute Carnegie Mellon University Pittsburgh, PA, USA {zhuhao, mengyan3, ybisk, gneubig}@cs.}cmu.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. It provides mathematical equations for model components and training objectives. |
| Open Source Code | Yes | Code and data can be found at https://github.com/neulab/To M-Language-Acquisition. |
| Open Datasets | Yes | Images and captions are drawn from the MS COCO dataset introduced in Lin et al. (2014). |
| Dataset Splits | Yes | The train-val-test split given by the MS COCO 2017 dataset is extended to our experimental setup. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. It only refers to general model architectures like Res Net and LSTM. |
| Software Dependencies | No | The paper mentions several software components and models (e.g., Res Net, LSTM, PPO, GPT-2 large, spaCy, CLIP, RoBERTa, sentence-transformers) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The speaker generates a sequence of tokens to either the maximum length, which is set to 20 in our experiments, or an end-of-sequence token. In our experiments, we use a vocabulary size of 200... We introduce two thresholds, θ1 and θ2. ... We train models with three different settings of wl. We train models with wl = 0... wl = 1... Finally, we train models where wl is the arbitrarily high constant 1000. ... σ is set to decay linearly over time... |