reproducibilityindex.ai

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Authors: Andis Draguns, Emīls Ozoliņš, Agris Šostaks, Matīss Apinis, Karlis Freivalds7245-7253

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our improved model on algorithmic tasks, LAMBADA question answering and multi-instrument musical note recognition (Music Net dataset). It surpasses the original Shufﬂe-Exchange network by 2.1% on the LAMBADA language modelling task and achieves state-of-the-art 78.02% average precision score on Music Net.
Researcher Affiliation	Academia	Andis Draguns, Em ıls Ozolin ˇs, Agris ˇSostaks, Mat ıss Apinis, K arlis Freivalds Institute of Mathematics and Computer Science, University of Latvia {andis.draguns, emils.ozolins, agris.sostaks, matiss.apinis, karlis.freivalds}@lumii.lv
Pseudocode	No	The paper provides mathematical equations for the Switch Units but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We have implemented the proposed architecture in Tensor Flow. The code is at https://github.com/LUMII-Syslab/RSE.
Open Datasets	Yes	We empirically validate our improved model on algorithmic tasks, LAMBADA question answering and multi-instrument musical note recognition (Music Net dataset). The LAMBADA dataset (Paperno et al. 2016). Music Net dataset (Thickstun, Harchaoui, and Kakade 2017). We use a pretrained fast Text 1M English word embedding (Mikolov et al. 2018).
Dataset Splits	No	The paper discusses training and testing, but does not explicitly provide percentages or counts for training, validation, and test splits needed to reproduce data partitioning.
Hardware Specification	Yes	All models are trained on a single Nvidia RTX 2080 Ti (11GB) GPU
Software Dependencies	No	The paper mentions 'Tensor Flow' and 'scikit-learn machine learning library' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use the RSE model having one Beneˇs block for addition and sorting tasks, two blocks for the multiplication task and m = 192 feature maps. We use dataset generators and curriculum learning from the article introducing neural Shufﬂe-Exchange networks (Freivalds, Ozolin ˇs, and ˇSostaks 2019). We instantiate the model for input length 256 (all test and train examples ﬁt into this length) and pad the input sequence to that length by placing the sequence at a random position and adding zeros on both ends. Randomized padding improves test accuracy. We use an RSE model with two Beneˇs blocks with 192 feature maps. We use the batch size of one example in this test to see the sequence length limit our model can be trained and tested on a single GPU.