Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time
Authors: Karlis Freivalds, Emīls Ozoliņš, Agris Šostaks
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our architecture on the challenging LAMBADA question answering dataset and compare it with the state-of-the-art models which use attention. We empirically validate our model on algorithmic tasks and the LAMBADA question answering task(Paperno et al., 2016). Our model achieves competitive accuracy and scales to sequences with more than a hundred thousand of elements. |
| Researcher Affiliation | Academia | Karlis Freivalds, Emils Ozolins, Agris Sostaks Institute of Mathematics and Computer Science University of Latvia Raina bulvaris 29, Riga, LV-1459, Latvia {Karlis.Freivalds, Emils.Ozolins, Agris.Sostaks}@lumii.lv |
| Pseudocode | No | The paper describes the mathematical operations of the Switch Unit using equations in Section 4, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | The code is available at https://github.com/LUMII-Syslab/shuffle-exchange. |
| Open Datasets | Yes | We evaluate our architecture on the challenging LAMBADA question answering dataset... The sentences in the LAMBADA dataset (Paperno et al., 2016) are specially selected such that giving the right answer requires examining the whole passage. |
| Dataset Splits | No | The paper mentions training and testing on different sequence lengths ('We train them on inputs of length 64 and test on 8x longer instances.') and the use of a 'test set', but it does not specify details about a dedicated validation set split (e.g., percentages or counts). |
| Hardware Specification | Yes | All models are trained on a single Nvidia RTX 2080 Ti (11GB) GPU with Adam optimizer (Kingma and Ba, 2014). |
| Software Dependencies | No | The paper states: 'We have implemented the proposed architecture in Tensor Flow.' While TensorFlow is mentioned, no specific version number for it or any other software dependencies is provided. |
| Experiment Setup | Yes | All models are trained on a single Nvidia RTX 2080 Ti (11GB) GPU with Adam optimizer (Kingma and Ba, 2014). For training we instantiate several models for different sequence lengths (powers of 2) sharing the same weights and train each example on the smallest instance it fits. A small model comprising one Beneš block and 192 feature maps suffice for these tasks. We use a pretrained fast Text 1M English word embedding (Mikolov et al., 2018) for the input words. The embedding layer is followed by 2 Beneš blocks with 384 feature maps. |