Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models
Authors: Nan Jiang, Wenge Rong, Min Gao, Yikang Shen, Zhang Xiong
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we conducted empirical analysis and comparisons on the standard Penn Tree Bank (PTB) [Marcus et al., 1993], Wiki Text-2 and Wiki Text-103 text datasets [Merity et al., 2017] with other conventional optimisation methods to assess its efficiency and accuracy on GPUs and CPUs1. |
| Researcher Affiliation | Academia | Nan Jiang , Wenge Rong , Min Gao , Yikang Shen , Zhang Xiong State Key Laboratory of Software Development Environment, Beihang University, China School of Computer Science and Engineering, Beihang University, China School of Software Engineering, Chongqing University, China Montr eal Institute for Learning Algorithms, Universt e de Montr eal, Canada |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All our codes and models are publicly available at https://github.com/jiangnanhugo/lmkit |
| Open Datasets | Yes | Furthermore, we conducted empirical analysis and comparisons on the standard Penn Tree Bank (PTB) [Marcus et al., 1993], Wiki Text-2 and Wiki Text-103 text datasets [Merity et al., 2017] |
| Dataset Splits | Yes | Table 1: Statistics of the PTB, Wiki Text-2 and Wiki Text-103 Dataset. Dataset PTB Wiki Text-2 Wiki Text-103 #train #valid #test #train #valid #test #train #valid #test |
| Hardware Specification | Yes | all experiments implemented with Theano framework [Theano Development Team, 2016] were run on one standalone GPU device with 12 GB of graphical memory (i.e., Nvidia K40m) |
| Software Dependencies | No | The paper states 'all experiments implemented with Theano framework [Theano Development Team, 2016]', which mentions the software but does not specify a version number for the Theano framework itself. |
| Experiment Setup | Yes | The input sentence s max length, hidden layer, output vocabulary and batch size were set as {50, 256, 267735, 20}, respectively. Furthermore, for the NCE and Blackout approximations, the hyper-parameter k was set to |V|/20 for smaller PTB and Wikitext-2 datasets and k = |V|/200 for the larger Wiki Text-103 dataset. |