RNNs implicitly implement tensor-product representations
Authors: R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate using synthetic data that TPDNs can successfully approximate linear and tree-based RNN autoencoder representations, suggesting that these representations exhibit interpretable compositional structure; we explore the settings that lead RNNs to induce such structure-sensitive representations. By contrast, further TPDN experiments show that the representations of four models trained to encode naturally-occurring sentences can be largely approximated with a bag of words, with only marginal improvements from more sophisticated structures. |
| Researcher Affiliation | Collaboration | R. Thomas Mc Coy,1 Tal Linzen,1 Ewan Dunbar,2 & Paul Smolensky3,1 1Department of Cognitive Science, Johns Hopkins University 2Laboratoire de Linguistique Formelle, CNRS Universit e Paris Diderot Sorbonne Paris Cit e 3Microsoft Research AI, Redmond, WA USA |
| Pseudocode | No | The paper describes the TPDN architecture using text and diagrams (Figure 2c), but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Py Torch code for the TPDN model is available on Git Hub,1 along with an interactive demo.2 1https://github.com/tommccoy1/tpdn |
| Open Datasets | Yes | Digit sequences: The sequences consisted of the digits from 0 to 9. We randomly generated 50,000 unique sequences with lengths ranging from 1 to 6 inclusive and averaging 5.2; these sequences were divided into 40,000 training sequences, 5,000 development sequences, and 5,000 test sequences. ... Infer Sent (Conneau et al., 2017), a Bi LSTM trained on the Stanford Natural Language Inference (SNLI) corpus (Bowman et al., 2015)... |
| Dataset Splits | Yes | Digit sequences: ...these sequences were divided into 40,000 training sequences, 5,000 development sequences, and 5,000 test sequences. ... Training proceeded with a batch size of 32, with loss on the held out development set computed after every 1,000 training examples. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments, such as CPU models, GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions software like 'Py Torch', 'Adam optimizer (Kingma & Ba, 2015)', and 'Sent Eval (Conneau & Kiela, 2018)' but does not specify version numbers for these components, which is necessary for reproducibility. |
| Experiment Setup | Yes | For all architectures, we used a digit embedding dimensionality of 10 (chosen arbitrarily) and a hidden layer size of 60... The networks were trained using the Adam optimizer (Kingma & Ba, 2015) with the standard initial learning rate of 0.001. We used negative log likelihood...Training proceeded with a batch size of 32... |