reproducibilityindex.ai

Structured Sparsification of Gated Recurrent Neural Networks

Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Alexander Markovich, Dmitry Vetrov4989-4996

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our approach on the text classiﬁcation and language modeling tasks. Our method improves the neuron-wise compression of the model in most of the tasks. We perform experiments with LSTM architecture in both sparsiﬁcation frameworks.
Researcher Affiliation	Collaboration	Ekaterina Lobacheva,1 Nadezhda Chirkova,1 Alexander Markovich,2 Dmitry Vetrov1,3 1Samsung-HSE Laboratory, National Research University Higher School of Economics 2National Research University Higher School of Economics 3Samsung AI Center Moscow Moscow, Russia {elobacheva, nchirkova, dvetrov}@hse.ru, amarkovich@edu.hse.ru
Pseudocode	Yes	Algorithm 1 Forward pass through Bayesian LSTM for one sequence. Require: [x1, . . . x T ], c0, h0 Require: Parameters μ, σ, b 1: Sample ϵx i , ϵh i , . . . , ϵi, ϵf, . . . N(0, I) 2: W x i = μx i + ϵx i σx i , . . . ; zi = μi + ϵi σi, . . . 3: // sampling reparametrized weights 4: for t = 1, . . . , T: do 5: ft = sigm W x f xt + W h f ht 1 zf + bf similarly for it, ot, gt 6: ct = ft ct 1 + it gt, ht = ot tanh(ct) zh return [c1, . . . , c T ], [h1, . . . , h T ]
Open Source Code	No	No explicit statement providing concrete access (e.g., a link or explicit release statement) to the source code for the methodology described in this paper was found.
Open Datasets	Yes	Penn Treebank (PTB) dataset (Marcus, Marcinkiewicz, and Santorini 1993). Internet Movie Database (IMDb) dataset (Maas et al. 2011) for binary classiﬁcation, and AG s Corpus of News Articles (AGNews) dataset (Zhang, Zhao, and Le Cun 2015) for four-class classiﬁcation
Dataset Splits	Yes	For language modeling, we evaluate quality on the validation and test sets. The strengths of the individual and group Lasso regularizations are selected using grid search, so that the validation perplexities of ISS, and our model, are approximately equal. we compute the gradients of each hidden neuron of the second LSTM layer w. r. t. the input of this layer, at different lag t, and average the norm of this gradient, over the validation set
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions 'standard Tensor Flow implementation' and uses optimization algorithms like 'Adam', but it does not provide specific version numbers for TensorFlow or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	All the small models, including the baseline model, are trained without dropout... for 20 epochs, with Stochastic Gradient Descent (SGD), with a decaying learning rate schedule: the initial learning rate is equal to 1, the learning rate starts to decay after the 4th epoch, the learning rate decay is equal to 0.6. For the two-level sparsiﬁcation (W+N), we use group Lasso regularization with λ1 = 0.002, and Lasso regularization with λ2 = 1e 5. For the three-level sparsiﬁcation (W+G+N), we use group Lasso regularization with λ1 = 0.0017, and Lasso regularization with λ2 = 1e 5. We use the threshold 1e 4 to prune the weights in both models during training. We train our networks using Adam (Kingma and Ba 2015). For the text classiﬁcation tasks, we use the learning rate equal to 0.0005 and train the Bayesian models for 800 / 150 epochs on IMDb / AGNews. For the language modeling tasks, we train the Bayesian models for 250 / 50 epochs on character-level / word-level tasks using the learning rate of 0.002. For all the weights that we sparsify, we initialize log σ with -3. We eliminate the weights with the signal-to-noise ratio less than τ = 0.05.