Selfish Sparse RNN Training

Authors: Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, Mykola Pechenizkiy

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using these strategies, we achieve state-of-the-art sparse training results, better than the dense-to-sparse methods, with various types of RNNs on Penn Tree Bank and Wikitext-2 datasets.
Researcher Affiliation Academia 1Department of Computer Science, Eindhoven University of Technology, the Netherlands 2Faculty of Electrical Engineering, Mathematics, and Computer Science at University of Twente, the Netherlands
Pseudocode Yes The pseudocode of the full training procedure of our algorithm is shown in Algorithm 1.
Open Source Code Yes Our codes are available at https://github.com/ Shiweiliuiiiiiii/Selfish-RNN.
Open Datasets Yes Penn Tree Bank dataset (Marcus et al., 1993) and AWD-LSTM-Mo S on Wiki Text-2 dataset (Melis et al., 2018).
Dataset Splits Yes Single model perplexity on validation and test sets for the Penn Treebank language modeling task with stacked LSTMs and RHNs.
Hardware Specification No The paper does not provide specific hardware details (such as GPU/CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions optimizers like 'Adam (Kingma & Ba, 2014)' but does not provide specific software dependency names with version numbers for libraries, frameworks, or environments used.
Experiment Setup Yes For fair comparison, we use the exact same hyperparameters and regularization introduced in ON-LSTM (Shen et al., 2019) and AWD-LSTM-Mo S (Yang et al., 2018). See Appendix A for hyperparameters.