Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Intrinsic Sparse Structures within Long Short-Term Memory
Authors: Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang, Fang Liu, Bin Hu, Yiran Chen, Hai Li
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method achieves 10.59 speedup without losing any perplexity of a language modeling of Penn Tree Bank dataset. It is also successfully evaluated through a compact model with only 2.69M weights for machine Question Answering of SQu AD dataset. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Wei Wen , Yiran Chen & Hai Li Electrical and Computer Engineering, Duke University EMAIL Yuxiong He , Samyam Rajbhandari , Minjia Zhang , Wenhan Wang , Fang Liu & Bin Hu Business AI and Bing , Microsoft EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our source code is available1. 1https://github.com/wenwei202/iss-rnns |
| Open Datasets | Yes | We evaluated our method by LSTMs and RHNs in language modeling of Penn Treebank dataset (Marcus et al. (1993)) and machine Question Answering of SQu AD dataset (Rajpurkar et al. (2016)). |
| Dataset Splits | Yes | Table 1: Learning ISS sparsity from scratch in stacked LSTMs. Method Dropout keep ratio Perplexity (validate, test) |
| Hardware Specification | Yes | To measure the inference speed, the experiments were run on a dual socket Intel Xeon CPU E52673 v3 @ 2.40GHz processor with a total of 24 cores (12 per socket) and 128GB of memory. |
| Software Dependencies | Yes | Intel MKL library 2017 update 2 was used for matrix-multiplication operations. Open MP runtime was utilized for parallelism. We used Intel C++ Compiler 17.0 to generate executables that were run on Windows Server 2016. |
| Experiment Setup | Yes | The same training scheme as the baseline is adopted to learn ISS sparsity, except a larger dropout keep ratio of 0.6 versus 0.35 of the baseline because group Lasso regularization can also avoid over-fitting. All models are trained from scratch for 55 epochs. [...] For a specific application, we preset τ by cross validation. The maximum τ which sparsifies the dense model (baseline) without deteriorating its performance is selected. The validation of τ is performed only once and no training effort is needed. τ is 1.0e 4 for the stacked LSTMs in Penn Tree Bank, and it is 4.0e 4 for the RHN and the Bi DAF model. |