Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences
Authors: Andis Draguns, Emīls Ozoliņš, Agris Šostaks, Matīss Apinis, Karlis Freivalds7245-7253
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our improved model on algorithmic tasks, LAMBADA question answering and multi-instrument musical note recognition (Music Net dataset). It surpasses the original Shuffle-Exchange network by 2.1% on the LAMBADA language modelling task and achieves state-of-the-art 78.02% average precision score on Music Net. |
| Researcher Affiliation | Academia | Andis Draguns, Em ıls Ozolin ˇs, Agris ˇSostaks, Mat ıss Apinis, K arlis Freivalds Institute of Mathematics and Computer Science, University of Latvia {andis.draguns, emils.ozolins, agris.sostaks, matiss.apinis, karlis.freivalds}@lumii.lv |
| Pseudocode | No | The paper provides mathematical equations for the Switch Units but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We have implemented the proposed architecture in Tensor Flow. The code is at https://github.com/LUMII-Syslab/RSE. |
| Open Datasets | Yes | We empirically validate our improved model on algorithmic tasks, LAMBADA question answering and multi-instrument musical note recognition (Music Net dataset). The LAMBADA dataset (Paperno et al. 2016). Music Net dataset (Thickstun, Harchaoui, and Kakade 2017). We use a pretrained fast Text 1M English word embedding (Mikolov et al. 2018). |
| Dataset Splits | No | The paper discusses training and testing, but does not explicitly provide percentages or counts for training, validation, and test splits needed to reproduce data partitioning. |
| Hardware Specification | Yes | All models are trained on a single Nvidia RTX 2080 Ti (11GB) GPU |
| Software Dependencies | No | The paper mentions 'Tensor Flow' and 'scikit-learn machine learning library' but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use the RSE model having one Beneˇs block for addition and sorting tasks, two blocks for the multiplication task and m = 192 feature maps. We use dataset generators and curriculum learning from the article introducing neural Shuffle-Exchange networks (Freivalds, Ozolin ˇs, and ˇSostaks 2019). We instantiate the model for input length 256 (all test and train examples fit into this length) and pad the input sequence to that length by placing the sequence at a random position and adding zeros on both ends. Randomized padding improves test accuracy. We use an RSE model with two Beneˇs blocks with 192 feature maps. We use the batch size of one example in this test to see the sequence length limit our model can be trained and tested on a single GPU. |