RNNs implicitly implement tensor-product representations

Authors: R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate using synthetic data that TPDNs can successfully approximate linear and tree-based RNN autoencoder representations, suggesting that these representations exhibit interpretable compositional structure; we explore the settings that lead RNNs to induce such structure-sensitive representations. By contrast, further TPDN experiments show that the representations of four models trained to encode naturally-occurring sentences can be largely approximated with a bag of words, with only marginal improvements from more sophisticated structures.
Researcher Affiliation Collaboration R. Thomas Mc Coy,1 Tal Linzen,1 Ewan Dunbar,2 & Paul Smolensky3,1 1Department of Cognitive Science, Johns Hopkins University 2Laboratoire de Linguistique Formelle, CNRS Universit e Paris Diderot Sorbonne Paris Cit e 3Microsoft Research AI, Redmond, WA USA
Pseudocode No The paper describes the TPDN architecture using text and diagrams (Figure 2c), but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Py Torch code for the TPDN model is available on Git Hub,1 along with an interactive demo.2 1https://github.com/tommccoy1/tpdn
Open Datasets Yes Digit sequences: The sequences consisted of the digits from 0 to 9. We randomly generated 50,000 unique sequences with lengths ranging from 1 to 6 inclusive and averaging 5.2; these sequences were divided into 40,000 training sequences, 5,000 development sequences, and 5,000 test sequences. ... Infer Sent (Conneau et al., 2017), a Bi LSTM trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al., 2015)...
Dataset Splits Yes Digit sequences: ...these sequences were divided into 40,000 training sequences, 5,000 development sequences, and 5,000 test sequences. ... Training proceeded with a batch size of 32, with loss on the held out development set computed after every 1,000 training examples.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments, such as CPU models, GPU models, or memory specifications.
Software Dependencies No The paper mentions software like 'Py Torch', 'Adam optimizer (Kingma & Ba, 2015)', and 'Sent Eval (Conneau & Kiela, 2018)' but does not specify version numbers for these components, which is necessary for reproducibility.
Experiment Setup Yes For all architectures, we used a digit embedding dimensionality of 10 (chosen arbitrarily) and a hidden layer size of 60... The networks were trained using the Adam optimizer (Kingma & Ba, 2015) with the standard initial learning rate of 0.001. We used negative log likelihood...Training proceeded with a batch size of 32...