Measuring Compositionality in Representation Learning

Authors: Jacob Andreas

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments and analyses aimed at answering four questions about the relationship between compositionality and learning: How does compositionality of representations evolve in relation to other measurable model properties over the course of the learning process? (Section 4) How well does compositionality of representations track human judgments about the compositionality of model inputs? (Section 5) How does compositionality constrain distances between representations, and how does TRE relate to other methods that analyze representations based on similarity? (Section 6) Are compositional representations necessary for generalization to out-of-distribution inputs? (Section 7)
Researcher Affiliation Academia Jacob Andreas Computer Science Division University of California, Berkeley jda@cs.berkeley.edu
Pseudocode No The paper provides mathematical formulations and descriptions of procedures, such as the Tree Reconstruction Error (TRE) calculation, but it does not include a distinct block or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code and data for all experiments in this paper are provided at https://github.com/jacobandreas/tre.
Open Datasets Yes Because our analysis focuses on compositional hypothesis classes, we use visual concepts from the Color MNIST dataset of Seo et al. (2017) (Figure 2). [...] We train embeddings for words and bigrams using the CBOW objective of Mikolov et al. (2013) using the implementation provided in Fast Text (Bojanowski et al., 2017) [...] Vectors are estimated from a 250M-word subset of the Gigaword dataset (Parker et al., 2011).
Dataset Splits Yes The training dataset consists of 9000 image triplets, evenly balanced between positive and negative classes, with a validation set of 500 examples.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments, such as CPU/GPU models, memory, or cloud instance types. It only mentions the use of CNNs and RNNs for the models.
Software Dependencies No The paper mentions software tools like Fast Text and optimization algorithms like ADAM, but it does not provide specific version numbers for these or any other software libraries or environments, which are necessary for full reproducibility.
Experiment Setup Yes The model is trained using ADAM (Kingma & Ba, 2014) with a learning rate of .001 and a batch size of 128. Training is ended when the model stops improving on a held-out set. [...] We train embeddings for words and bigrams using the CBOW objective of Mikolov et al. (2013) using the implementation provided in Fast Text (Bojanowski et al., 2017) with 100-dimensional vectors and a context size of 5. [...] The encoder and decoder RNNs both use gated recurrent units (Cho et al., 2014) with embeddings and hidden states of size 256. The size of the discrete vocabulary is set to 16 and the maximum message length to 4. Training uses a policy gradient objective with a scalar baseline set to the running average reward; this is optimized using ADAM (Kingma & Ba, 2014) with a learning rate of .001 and a batch size of 256. Each model is trained for 500 steps.