Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Authors: Andis Draguns, Emīls Ozoliņš, Agris Šostaks, Matīss Apinis, Karlis Freivalds7245-7253

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our improved model on algorithmic tasks, LAMBADA question answering and multi-instrument musical note recognition (Music Net dataset). It surpasses the original Shuffle-Exchange network by 2.1% on the LAMBADA language modelling task and achieves state-of-the-art 78.02% average precision score on Music Net.
Researcher Affiliation Academia Andis Draguns, Em ıls Ozolin ˇs, Agris ˇSostaks, Mat ıss Apinis, K arlis Freivalds Institute of Mathematics and Computer Science, University of Latvia {andis.draguns, emils.ozolins, agris.sostaks, matiss.apinis, karlis.freivalds}@lumii.lv
Pseudocode No The paper provides mathematical equations for the Switch Units but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes We have implemented the proposed architecture in Tensor Flow. The code is at https://github.com/LUMII-Syslab/RSE.
Open Datasets Yes We empirically validate our improved model on algorithmic tasks, LAMBADA question answering and multi-instrument musical note recognition (Music Net dataset). The LAMBADA dataset (Paperno et al. 2016). Music Net dataset (Thickstun, Harchaoui, and Kakade 2017). We use a pretrained fast Text 1M English word embedding (Mikolov et al. 2018).
Dataset Splits No The paper discusses training and testing, but does not explicitly provide percentages or counts for training, validation, and test splits needed to reproduce data partitioning.
Hardware Specification Yes All models are trained on a single Nvidia RTX 2080 Ti (11GB) GPU
Software Dependencies No The paper mentions 'Tensor Flow' and 'scikit-learn machine learning library' but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes We use the RSE model having one Beneˇs block for addition and sorting tasks, two blocks for the multiplication task and m = 192 feature maps. We use dataset generators and curriculum learning from the article introducing neural Shuffle-Exchange networks (Freivalds, Ozolin ˇs, and ˇSostaks 2019). We instantiate the model for input length 256 (all test and train examples fit into this length) and pad the input sequence to that length by placing the sequence at a random position and adding zeros on both ends. Randomized padding improves test accuracy. We use an RSE model with two Beneˇs blocks with 192 feature maps. We use the batch size of one example in this test to see the sequence length limit our model can be trained and tested on a single GPU.