SkipW: Resource Adaptable RNN with Strict Upper Computational Limit
Authors: Tsiry Mayet, Anne Lambert, Pascal Leguyadec, Francoise Le Bolzer, François Schnitzler
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this approach on four datasets: a human activity recognition task, sequential MNIST, IMDB and adding task. Our results show that Skip-Window is often able to exceed the accuracy of existing approaches for a lower computational cost while strictly limiting said cost. |
| Researcher Affiliation | Industry | Inter Digital Inc. Cesson-S evign e, France {firstname.lastname}@interdigital.com |
| Pseudocode | No | The paper contains equations and architectural descriptions but no clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code or a link to a code repository. |
| Open Datasets | Yes | We evaluate our approach on four data sets. Human Activity Recognition (HAR) (Ofli et al., 2013)... Sequential MNIST (Lecun et al., 1998)... Adding Task Hochreiter & Schmidhuber (1997)... IMDB (Maas et al., 2011)... |
| Dataset Splits | Yes | HAR: The dataset is split into 2 independent partitions : 22625 sequences for training and 5751 for validation. Sequential MNIST: We follow the standard data split and set aside 5,000 training samples for validation purposes. IMDB: We set aside about 15% of training data for validation purposes. |
| Hardware Specification | Yes | We implement the full service, from images to activity recognition, on a Nvidia Jetson Nano platform... We evaluate the performance of Skip W on small hardware... Results on Jetson TX2 and Raspberry Pi4 lead to similar conclusions (Appendix G). The hardware specification of these different devices is provided in Table 3. |
| Software Dependencies | No | The paper mentions software like Open Pose and Pose Net (Mobile Net V1 architecture with a 0.75 multiplier) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The model is trained with batches of 512 sequences using a decaying learning rate for 600 epochs. The model architecture consists of a two-stacked RNN of 60 GRU cells each followed by a fully connected layer with a RELU activation function. The following parameters were included in the search: Batch size: 4096 and 512 λ {1e 4, 1e 3, 1e 2} Cell type: LSTM or GRU Number of cells {30, 40, 50, 60} per layer (identical number of cells in each layer) Window size L {4, 8, 16} (Skip W only) |