Incrementally Learning the Hierarchical Softmax Function for Neural Language Models
Authors: Hao Peng, Jianxin Li, Yangqiu Song, Yaopeng Liu
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that incremental training can save a lot of time. The smaller the update corpus is, the faster the update training process is, where an up to 30 times speedup has been achieved. We also use both word similarity/relatedness tasks and dependency parsing task as our benchmarks to evaluate the correctness of the updated word vectors. |
| Researcher Affiliation | Academia | Department of Computer Science & Engineering, Beihang University, Beijing 100191, China Department of Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong {penghao,ljx,liuyp}@act.buaa.edu.cn yqsong@cse.ust.hk |
| Pseudocode | No | The paper describes the proposed method using mathematical formulas and textual explanations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our system is publicly available at https://github.com/RingBDStack/incremental-word2vec |
| Open Datasets | Yes | We use the English Wikipedia as the source to train the NNLMs. We use the datasets collected by Faruqui and Dyer (Faruqui and Dyer 2014) which include MC-30, TR-3k, MTurk-287, MTurk-771, RG-65, RW-STANFORD (RW), SIMLEX-999, VERB-143, WS-353-ALL, WS-353-REL, WS-353-SIM, and YP-130.3 (http://www.wordvectors.org/). The data used to train and evaluate the parser is the English data in the Co NLLX shared task (Buchholz and Marsi 2006). |
| Dataset Splits | No | The paper describes using an 'initial training corpus' and 'new update corpora' for its incremental training setup, and mentions data for 'training and evaluating' the dependency parser. However, it does not provide specific train/validation/test dataset splits, explicit validation set details, or cross-validation setup needed for reproduction. |
| Hardware Specification | No | The paper states 'For all the experiments, we run with 10 CPU threads and generate word embeddings with 300 dimensions.' It does not specify exact CPU models, types, or any other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions using the 'word2vec package' and a 'CNN model (Guo et al. 2015)' for experiments, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | For all the experiments, we run with 10 CPU threads and generate word embeddings with 300 dimensions. The learning rate η0 = 0.025 is the default setting in word2vec package, with η updated after seeing every 10,000 words. For Dependency Parsing, the model is trained with 200,000 iterations, and parameters are set as distance of embedding to be 5, valency of embedding to be 5, and cluster of embedding to be 8. |