reproducibilityindex.ai

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

Authors: Kyuhong Shim, Minjae Lee, Iksoo Choi, Yoonho Boo, Wonyong Sung

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed algorithm provides both fast and accurate evaluation of the most probable top-K word probabilities.
Researcher Affiliation	Academia	Kyuhong Shim, Minjae Lee, Iksoo Choi, Yoonho Boo, Wonyong Sung Department of Electrical and Computer Engineering Seoul National University, Seoul, Korea skhu20@snu.ac.kr, {mjlee, ischoi, yhboo}@dsp.snu.ac.kr, wysung@snu.ac.kr
Pseudocode	Yes	Algorithm 1 Algorithm of the proposed SVD-softmax.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	The Wiki Text-2 [20] and One Billion Word benchmark (OBW) [21] datasets were used for language modeling.
Dataset Splits	No	The paper mentions training data and evaluation data (e.g., 'approximately 2M training tokens', 'One thousand sequential frames were used for the evaluation', 'evaluated with newstest 2013'), but it does not provide specific percentages, sample counts, or explicit citations for reproducible training/validation/test splits across all datasets.
Hardware Specification	Yes	The experiment was conducted on a NVIDIA GTX Titan-X (Pascal) GPU and Intel i7-6850 CPU.
Software Dependencies	No	The paper mentions tools like 'Open NMT toolkit' and 'Moses toolkit' but does not specify version numbers for these or any other software dependencies, such as deep learning frameworks or specific libraries.
Experiment Setup	Yes	The models were trained with stochastic gradient descent (SGD) with an initial learning rate of 1.0 and momentum of 0.95. The batch size was set to 20, and the network was unrolled for 35 timesteps. Dropout [23] was applied to the LSTM output with a drop ratio of 0.5. Gradient clipping [24] of maximum norm value 5 was applied.