Systematic Generalization with Edge Transformers
Authors: Leon Bergen, Timothy O'Donnell, Dzmitry Bahdanau
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing1. In all three settings, the Edge Transformer outperforms Relation-aware, Universal and classical Transformer baselines.In our experiments we compare the systematic generalization ability of Edge Transformers to that of Transformers (Vaswani et al., 2017), Universal Transformers (Dehghani et al., 2019), Relationaware Transformers (Shaw et al., 2018), Graph Attention Networks (Veliˇckovi c et al., 2018) and other baselines. |
| Researcher Affiliation | Collaboration | Leon Bergen University of California, San Diego lbergen@ucsd.edu Timothy J. O Donnell Mc Gill University Quebec Artificial Intelligence Institute (Mila) Canada CIFAR AI Chair Dzmitry Bahdanau Element AI, a Service Now company Mc Gill University Quebec Artificial Intelligence Institute (Mila) Canada CIFAR AI Chair |
| Pseudocode | No | The paper provides mathematical equations and descriptions of the model's computations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code for our experiments can be found at: github.com/bergen/Edge Transformer |
| Open Datasets | Yes | We focus on three synthetic benchmarks with carefully controlled train-test splits, Compositional Language Understanding and Text-based Relational Reasoning (CLUTRR), proposed by Sinha et al. (2019), Compositional Freebase Questions (CFQ) proposed by Keysers et al. (2020) and Compositional Generation Challenge based on Semantic Interpretation (COGS) by Kim and Linzen (2020). |
| Dataset Splits | No | The paper discusses 'train-test splits' and tuning hyperparameters on a 'random split', implying the use of a validation set. However, it does not provide specific percentages or sample counts for training, validation, and test splits needed to reproduce the data partitioning. For example, it mentions 'the original CLUTRR training set' and 'a larger training set', but no explicit validation set size. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan V GPUs used for this research.Even after this filtering training an Edge Transformer model on CFQ semantic parsing requires 1-2 days of using 4 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions using the 'Stanza framework for dependency parsing' and notes that 'Einstein summation operation which is readily available in modern deep learning frameworks' is used for implementation. However, it does not provide specific version numbers for these or any other software dependencies such as Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | For the chosen hyperparameter settings see Table 1. Table 1: Hyperparameter settings for the Edge Transformer and for the baselines. L is the number of layers, d is the dimensionality, h is the number of heads, B is the batch size, ρ is the learning rate, T is training duration. |