Permutation Equivariant Models for Compositional Generalization in Language

Authors: Jonathan Gordon, David Lopez-Paz, Marco Baroni, Diane Bouchacourt

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now evaluate the empirical performance of our equivariant seq2seq model (described in Section 4) on the four SCAN tasks (described in Section 2). We compare our equivariant seq2seq to regular seq2seq models (Lake & Baroni, 2018), convolutional models (Dessì & Baroni, 2019), the syntactic attention model of Russin et al. (2019), and the meta-learning approach of Lake (2019).Table 1 summarizes the results of our experiments.
Researcher Affiliation Collaboration Jonathan Gordon University of Cambridge jg801@cam.ac.uk David Lopez-Paz, Marco Baroni, Diane Bouchacourt Facebook AI Research {dlp, mbaroni, dianeb}@fb.com
Pseudocode No The paper provides mathematical equations for the G-LSTM cell in Appendix B, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code available at https://github.com/facebookresearch/Permutation-Equivariant-Seq2Seq
Open Datasets Yes Lake & Baroni (2018) proposed the Simplified version of the Comm AI Navigation (SCAN), a dataset to benchmark the compositional generalization capabilities of state-ofthe-art sequence-to-sequence (seq2seq) translation models (Sutskever et1 al., 2014; Bahdanau et al., 2015).
Dataset Splits Yes We use teacher-forcing (Williams & Zipser, 1989) with a ratio of 0.5, and early-stopping based on a validation set consisting on 10% of the training examples.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type, memory) used to run the experiments.
Software Dependencies No The paper mentions using 'Py Torch NN.EMBEDDING' as an example for implementation, but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We train models for 200k iterations, where each iteration consists of a minibatch of size 1, using the Adam optimizer to perform parameter updates with default parameters (Kingma & Ba, 2015) with a learning rate of 1e-4. We use teacher-forcing (Williams & Zipser, 1989) with a ratio of 0.5, and early-stopping based on a validation set consisting on 10% of the training examples.