Permutation Equivariant Models for Compositional Generalization in Language
Authors: Jonathan Gordon, David Lopez-Paz, Marco Baroni, Diane Bouchacourt
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now evaluate the empirical performance of our equivariant seq2seq model (described in Section 4) on the four SCAN tasks (described in Section 2). We compare our equivariant seq2seq to regular seq2seq models (Lake & Baroni, 2018), convolutional models (Dessì & Baroni, 2019), the syntactic attention model of Russin et al. (2019), and the meta-learning approach of Lake (2019).Table 1 summarizes the results of our experiments. |
| Researcher Affiliation | Collaboration | Jonathan Gordon University of Cambridge jg801@cam.ac.uk David Lopez-Paz, Marco Baroni, Diane Bouchacourt Facebook AI Research {dlp, mbaroni, dianeb}@fb.com |
| Pseudocode | No | The paper provides mathematical equations for the G-LSTM cell in Appendix B, but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code available at https://github.com/facebookresearch/Permutation-Equivariant-Seq2Seq |
| Open Datasets | Yes | Lake & Baroni (2018) proposed the Simplified version of the Comm AI Navigation (SCAN), a dataset to benchmark the compositional generalization capabilities of state-ofthe-art sequence-to-sequence (seq2seq) translation models (Sutskever et1 al., 2014; Bahdanau et al., 2015). |
| Dataset Splits | Yes | We use teacher-forcing (Williams & Zipser, 1989) with a ratio of 0.5, and early-stopping based on a validation set consisting on 10% of the training examples. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'Py Torch NN.EMBEDDING' as an example for implementation, but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We train models for 200k iterations, where each iteration consists of a minibatch of size 1, using the Adam optimizer to perform parameter updates with default parameters (Kingma & Ba, 2015) with a learning rate of 1e-4. We use teacher-forcing (Williams & Zipser, 1989) with a ratio of 0.5, and early-stopping based on a validation set consisting on 10% of the training examples. |