Neural Speed Reading with Structural-Jump-LSTM

Authors: Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A comprehensive experimental evaluation of our model against all five state-of-the-art neural reading models shows that Structural-Jump-LSTM achieves the best overall floating point operations (FLOP) reduction (hence is faster), while keeping the same accuracy or even improving it compared to a vanilla LSTM that reads the whole text.
Researcher Affiliation Academia Department of Computer Science University of Copenhagen Denmark, Copenhagen 2100 {chrh,c.hansen,s.alstrup,simonsen,c.lioma}@di.ku.dk
Pseudocode No The paper contains a diagram (Figure 1) that provides an overview of the model, but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes 1https://github.com/Varyn/Neural-Speed-Reading-with-Structural-Jump-LSTM
Open Datasets Yes We use the same tasks and datasets used by the state-of-the-art in speed reading (displayed in Table 1), and evaluate against all 5 state-of-the-art models (Seo et al., 2018; Yu et al., 2017; 2018; Fu & Ma, 2018; Huang et al., 2017) in addition to a vanilla LSTM full reading baseline.
Dataset Splits Yes We use the predefined train, validation, and testing splits for IMDB, SST, CBT-CN, and CBT-NE, and use 15% of the training data in the rest as validation. For Rotten Tomatoes there is no predefined split, so we set aside 10% for testing as done by Yu et al. (2017).
Hardware Specification No We calculate the total FLOPs used by the models as done by Seo et al. (2018) and Yu et al. (2018), reported as a FLOP reduction (FLOP-r) between the full read and speed read model. This is done to avoid runtime dependencies on optimized implementations, hardware setups, and whether the model is evaluated on CPU or GPU.
Software Dependencies No The paper mentions software components like "RMSprop" and "LSTM cell" but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages.
Experiment Setup Yes For training the model we use RMSprop with a learning rate chosen from the set {0.001, 0.0005}, with optimal of 0.001 on question answering datasets (CBT-CN and CBT-NE) and 0.0005 on the topic and sentiment datasets. We use a batch size of 32 on AG news, Rotten Tomatoes, and SST and a batch size of 100 for the remaining. Similarly to Yu et al. (2017), we employ dropout to reduce overfitting, with 0.1 on the embedding and 0.1 on the output of the LSTM. For RNN we use an LSTM cell with a size of 128, and apply gradient clipping with a tresholded value of 0.1. For both agents, their small fully connected layer is fixed to 25 neurons.