Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models

Authors: Nan Jiang, Wenge Rong, Min Gao, Yikang Shen, Zhang Xiong

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, we conducted empirical analysis and comparisons on the standard Penn Tree Bank (PTB) [Marcus et al., 1993], Wiki Text-2 and Wiki Text-103 text datasets [Merity et al., 2017] with other conventional optimisation methods to assess its efficiency and accuracy on GPUs and CPUs1.
Researcher Affiliation Academia Nan Jiang , Wenge Rong , Min Gao , Yikang Shen , Zhang Xiong State Key Laboratory of Software Development Environment, Beihang University, China School of Computer Science and Engineering, Beihang University, China School of Software Engineering, Chongqing University, China Montr eal Institute for Learning Algorithms, Universt e de Montr eal, Canada
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes All our codes and models are publicly available at https://github.com/jiangnanhugo/lmkit
Open Datasets Yes Furthermore, we conducted empirical analysis and comparisons on the standard Penn Tree Bank (PTB) [Marcus et al., 1993], Wiki Text-2 and Wiki Text-103 text datasets [Merity et al., 2017]
Dataset Splits Yes Table 1: Statistics of the PTB, Wiki Text-2 and Wiki Text-103 Dataset. Dataset PTB Wiki Text-2 Wiki Text-103 #train #valid #test #train #valid #test #train #valid #test
Hardware Specification Yes all experiments implemented with Theano framework [Theano Development Team, 2016] were run on one standalone GPU device with 12 GB of graphical memory (i.e., Nvidia K40m)
Software Dependencies No The paper states 'all experiments implemented with Theano framework [Theano Development Team, 2016]', which mentions the software but does not specify a version number for the Theano framework itself.
Experiment Setup Yes The input sentence s max length, hidden layer, output vocabulary and batch size were set as {50, 256, 267735, 20}, respectively. Furthermore, for the NCE and Blackout approximations, the hyper-parameter k was set to |V|/20 for smaller PTB and Wikitext-2 datasets and k = |V|/200 for the larger Wiki Text-103 dataset.