Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Few-shot Language Coordination by Modeling Theory of Mind
Authors: Hao Zhu, Graham Neubig, Yonatan Bisk
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine our hypothesis that the instructions generated with To M modeling yield better communication performance in both a referential game and a language navigation task. Positive results from our experiments hint at the importance of explicitly modeling communication as a socio-pragmatic progress. Code can be found at https://github.com/CLAW-Lab/To M. |
| Researcher Affiliation | Academia | Hao Zhu 1 Graham Neubig 1 Yonatan Bisk 1 1Language Technologies Institute, Carnegie Mellon University. Correspondence to: Hao Zhu <EMAIL>. |
| Pseudocode | Yes | Procedure 1. General Theory-of-Mind (To M) model training procedure. Algorithm 1 Evaluate To M Model |
| Open Source Code | Yes | Code can be found at https://github.com/CLAW-Lab/To M. |
| Open Datasets | Yes | Following Lazaridou et al. (2016); Lowe et al. (2019a), we use 30k image-caption pairs from MSCOCO dataset (Lin et al., 2014). |
| Dataset Splits | Yes | These listeners are randomly divided into training, validation, and testing listeners (80/20/20). These listeners are randomly divided into training, validation, and testing listeners (30/10/10). |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or cloud instance specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions software components like LSTMs, ResNet, and Gumbel-softmax, but does not provide specific version numbers for any libraries, frameworks, or programming languages used in the implementation or experiments. |
| Experiment Setup | Yes | The MAML hyper-parameters are = 0.01, Ninner = 5, outer = 0.0001, Nouter = 500, and batch size is 2. Within one session, the maximum number of interactions between speaker and listener is K = 100, and maximum number of interactions in a game is 20. |