reproducibilityindex.ai

Mapping Language Models to Grounded Conceptual Spaces

Authors: Roma Patel, Ellie Pavlick

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we investigate the extent to which the rich conceptual structure that LMs learn indeed reﬂects the conceptual structure of the non-linguistic world which is something that LMs have never observed. We do this by testing whether the LMs can learn to map an entire conceptual domain (e.g., direction or colour) onto a grounded world representation given only a small number of examples. For example, we show a model what the word left means using a textual depiction of a grid world, and assess how well it can generalise to related concepts, for example, the word right , in a similar grid world. We investigate a range of generative language models of varying sizes (including GPT-2 and GPT-3), and see that although the smaller models struggle to perform this mapping, the largest model can not only learn to ground the concepts that it is explicitly taught, but appears to generalise to several instances of unseen concepts as well. Our results suggest an alternative means of building grounded language models: rather than learning grounded representations from scratch , it is possible that large text-only models learn a sufﬁciently rich conceptual structure that could allow them to be grounded in a data-efﬁcient way.
Researcher Affiliation	Academia	Roma Patel & Ellie Pavlick Department of Computer Science Brown University {romapatel,ellie pavlick}@brown.edu
Pseudocode	No	The paper describes its experimental procedures and model setups but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using 'GPT-3 model Brown et al. (2020) and four GPT-2 Radford et al. (2019) models from the Hugging Face Transformer Wolf et al. (2019) library,' indicating use of third-party open-source software. However, there is no explicit statement or link indicating that the authors have released their own implementation code for the methodology described in the paper.
Open Datasets	Yes	We consider colour terms in a three-dimensional space, using a dataset of 367 RGB colours (Abdou et al., 2021) that contains colour names (e.g., red, cyan, forest green) each associated with an RGB code (e.g., (255, 0, 0)).
Dataset Splits	No	The paper describes using in-context learning where 'a prompt includes n task examples' for the model to learn. It also mentions creating 'training and testing splits' for generalisation evaluations. However, it does not specify explicit dataset splits for training, validation, and testing (e.g., percentages or counts for distinct data partitions for these phases) in the traditional sense, nor does it explicitly mention a validation set.
Hardware Specification	No	The paper mentions the sizes of the language models used (e.g., 'Our smallest model contains 124M parameters... 175B parameters'), which refer to software model sizes. However, it does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run these models or conduct the experiments.
Software Dependencies	No	The paper states: 'We use a GPT-3 model Brown et al. (2020) and four GPT-2 Radford et al. (2019) models from the Hugging Face Transformer Wolf et al. (2019) library.' While it mentions the Hugging Face Transformer library and the specific models, it does not provide specific version numbers for these software components or any other ancillary libraries or programming languages used in the experiments.
Experiment Setup	Yes	We generate up to 5 tokens per prompt and, to improve the robustness of our analyses, generate 3 samples per prompt. We use a temperature of 1 during generation and sample from the softmax probabilities produced at each time step using nucleus sampling (Holtzman et al., 2019) with p=0.85.