reproducibilityindex.ai

Neural Speed Reading via Skim-RNN

Authors: Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we show that Skim-RNN can achieve signiﬁcantly reduced computational cost without losing accuracy compared to standard RNNs across ﬁve different natural language tasks.
Researcher Affiliation	Collaboration	Clova AI Research, NAVER1 University of Washington2 Seoul National University3 Allen Institute for Artiﬁcial Intelligence4 XNOR.AI5
Pseudocode	No	The paper describes the Skim-RNN architecture and its inference and training processes using mathematical equations and textual explanations, but it does not include a formally labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not include any explicit statements about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	Table 1 lists common datasets used (SST, Rotten Tomatoes, IMDb, AGNews, CBT-NE, CBT-CN, SQuAD) and specifies the 'Number of examples' for training, validation, and test sets for several of these, indicating the use of established public datasets and their splits.
Dataset Splits	Yes	Table 1 explicitly lists 'Number of examples' for training, validation, and test sets for datasets like SST, Rotten Tomatoes, IMDb, AGNews, CBT-NE, and CBT-CN. For SQuAD, it provides train and dev (validation) sizes. This clearly indicates specified dataset splits.
Hardware Specification	Yes	Comparing between Num Py with CPU and Tensor Flow with GPU (Titan X), we observe that the former has 1.5 times lower latency (75 µs vs 110 µs per token) for LSTM of d = 100.
Software Dependencies	No	The paper mentions 'Python (Num Py)', 'Tensor Flow', and 'Py Torch' in the context of benchmarking speed. However, it does not provide specific version numbers for these software components, which is necessary for reproducible software dependencies.
Experiment Setup	Yes	We use Adam (Kingma & Ba, 2015) for optimization, with initial learning rate of 0.0001. For Skim-LSTM, τ = max(0.5, exp( rn)) where r = 1e 4 and n is the global training step, following Jang et al. (2017). We experiment on different sizes of big LSTM (d {100, 200}) and small LSTM (d {5, 10, 20}) and the ratio between the model loss and the skim loss (γ {0.01, 0.02}) for Skim-LSTM. We use batch size of 32 for SST and Rotten Tomatoes, and 128 for others. For all models, we stop early when the validation accuracy does not increase for 3000 global steps.