SkipW: Resource Adaptable RNN with Strict Upper Computational Limit

Authors: Tsiry Mayet, Anne Lambert, Pascal Leguyadec, Francoise Le Bolzer, François Schnitzler

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate this approach on four datasets: a human activity recognition task, sequential MNIST, IMDB and adding task. Our results show that Skip-Window is often able to exceed the accuracy of existing approaches for a lower computational cost while strictly limiting said cost.
Researcher Affiliation Industry Inter Digital Inc. Cesson-S evign e, France {firstname.lastname}@interdigital.com
Pseudocode No The paper contains equations and architectural descriptions but no clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets Yes We evaluate our approach on four data sets. Human Activity Recognition (HAR) (Ofli et al., 2013)... Sequential MNIST (Lecun et al., 1998)... Adding Task Hochreiter & Schmidhuber (1997)... IMDB (Maas et al., 2011)...
Dataset Splits Yes HAR: The dataset is split into 2 independent partitions : 22625 sequences for training and 5751 for validation. Sequential MNIST: We follow the standard data split and set aside 5,000 training samples for validation purposes. IMDB: We set aside about 15% of training data for validation purposes.
Hardware Specification Yes We implement the full service, from images to activity recognition, on a Nvidia Jetson Nano platform... We evaluate the performance of Skip W on small hardware... Results on Jetson TX2 and Raspberry Pi4 lead to similar conclusions (Appendix G). The hardware specification of these different devices is provided in Table 3.
Software Dependencies No The paper mentions software like Open Pose and Pose Net (Mobile Net V1 architecture with a 0.75 multiplier) but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The model is trained with batches of 512 sequences using a decaying learning rate for 600 epochs. The model architecture consists of a two-stacked RNN of 60 GRU cells each followed by a fully connected layer with a RELU activation function. The following parameters were included in the search: Batch size: 4096 and 512 λ {1e 4, 1e 3, 1e 2} Cell type: LSTM or GRU Number of cells {30, 40, 50, 60} per layer (identical number of cells in each layer) Window size L {4, 8, 16} (Skip W only)