reproducibilityindex.ai

Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

Authors: Karlis Freivalds, Emīls Ozoliņš, Agris Šostaks

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our architecture on the challenging LAMBADA question answering dataset and compare it with the state-of-the-art models which use attention. We empirically validate our model on algorithmic tasks and the LAMBADA question answering task(Paperno et al., 2016). Our model achieves competitive accuracy and scales to sequences with more than a hundred thousand of elements.
Researcher Affiliation	Academia	Karlis Freivalds, Emils Ozolins, Agris Sostaks Institute of Mathematics and Computer Science University of Latvia Raina bulvaris 29, Riga, LV-1459, Latvia {Karlis.Freivalds, Emils.Ozolins, Agris.Sostaks}@lumii.lv
Pseudocode	No	The paper describes the mathematical operations of the Switch Unit using equations in Section 4, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	The code is available at https://github.com/LUMII-Syslab/shuffle-exchange.
Open Datasets	Yes	We evaluate our architecture on the challenging LAMBADA question answering dataset... The sentences in the LAMBADA dataset (Paperno et al., 2016) are specially selected such that giving the right answer requires examining the whole passage.
Dataset Splits	No	The paper mentions training and testing on different sequence lengths ('We train them on inputs of length 64 and test on 8x longer instances.') and the use of a 'test set', but it does not specify details about a dedicated validation set split (e.g., percentages or counts).
Hardware Specification	Yes	All models are trained on a single Nvidia RTX 2080 Ti (11GB) GPU with Adam optimizer (Kingma and Ba, 2014).
Software Dependencies	No	The paper states: 'We have implemented the proposed architecture in Tensor Flow.' While TensorFlow is mentioned, no specific version number for it or any other software dependencies is provided.
Experiment Setup	Yes	All models are trained on a single Nvidia RTX 2080 Ti (11GB) GPU with Adam optimizer (Kingma and Ba, 2014). For training we instantiate several models for different sequence lengths (powers of 2) sharing the same weights and train each example on the smallest instance it ﬁts. A small model comprising one Beneš block and 192 feature maps sufﬁce for these tasks. We use a pretrained fast Text 1M English word embedding (Mikolov et al., 2018) for the input words. The embedding layer is followed by 2 Beneš blocks with 384 feature maps.