reproducibilityindex.ai

Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models

Authors: Nan Jiang, Wenge Rong, Min Gao, Yikang Shen, Zhang Xiong

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, we conducted empirical analysis and comparisons on the standard Penn Tree Bank (PTB) [Marcus et al., 1993], Wiki Text-2 and Wiki Text-103 text datasets [Merity et al., 2017] with other conventional optimisation methods to assess its efﬁciency and accuracy on GPUs and CPUs1.
Researcher Affiliation	Academia	Nan Jiang , Wenge Rong , Min Gao , Yikang Shen , Zhang Xiong State Key Laboratory of Software Development Environment, Beihang University, China School of Computer Science and Engineering, Beihang University, China School of Software Engineering, Chongqing University, China Montr eal Institute for Learning Algorithms, Universt e de Montr eal, Canada
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All our codes and models are publicly available at https://github.com/jiangnanhugo/lmkit
Open Datasets	Yes	Furthermore, we conducted empirical analysis and comparisons on the standard Penn Tree Bank (PTB) [Marcus et al., 1993], Wiki Text-2 and Wiki Text-103 text datasets [Merity et al., 2017]
Dataset Splits	Yes	Table 1: Statistics of the PTB, Wiki Text-2 and Wiki Text-103 Dataset. Dataset PTB Wiki Text-2 Wiki Text-103 #train #valid #test #train #valid #test #train #valid #test
Hardware Specification	Yes	all experiments implemented with Theano framework [Theano Development Team, 2016] were run on one standalone GPU device with 12 GB of graphical memory (i.e., Nvidia K40m)
Software Dependencies	No	The paper states 'all experiments implemented with Theano framework [Theano Development Team, 2016]', which mentions the software but does not specify a version number for the Theano framework itself.
Experiment Setup	Yes	The input sentence s max length, hidden layer, output vocabulary and batch size were set as {50, 256, 267735, 20}, respectively. Furthermore, for the NCE and Blackout approximations, the hyper-parameter k was set to \|V\|/20 for smaller PTB and Wikitext-2 datasets and k = \|V\|/200 for the larger Wiki Text-103 dataset.