Learning Compact Neural Word Embeddings by Parameter Space Sharing
Authors: Jun Suzuki, Masaaki Nagata
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the trade-off between quality and model size of embedding vectors for several linguistic benchmark datasets, and show that our method can significantly reduce the model size while maintaining the task performance of conventional methods. 4 Experiments |
| Researcher Affiliation | Industry | Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237 Japan {suzuki.jun, nagata.masaaki}@lab.ntt.co.jp |
| Pseudocode | Yes | Figure 1: Our sequential and iterative parameter update procedure derived from the ADMM framework. |
| Open Source Code | No | We used the hyperwords tool6 for data preparation [Levy et al., 2015]. ... We used the word2vec implementation but modified the code to save the context vectors as well as the word vectors7. ... 6https://bitbucket.org/omerlevy/hyperwords 7https://code.google.com/p/word2vec/ (trunc42) The paper refers to third-party tools and a modified version of one, but does not provide a link or clear statement about the availability of their own source code for the proposed PS-SGNS method. |
| Open Datasets | Yes | First, our training data was taken from a Wikipedia dump (Aug. 2014). We used the hyperwords tool6 for data preparation [Levy et al., 2015]. Finally, we obtained approximately 1.6 billion tokens of training data D. Table 2: Benchmark datasets used in our experiments. |
| Dataset Splits | No | The paper mentions training data and benchmark datasets for evaluation, but it does not provide specific details on how the data was split into training, validation, and test sets, or specify cross-validation settings. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU or GPU models used for experiments. It mentions 'limited memory devices, such as mobile devices' in the problem description, but this refers to the target application environment, not the experimental setup hardware. |
| Software Dependencies | No | The paper mentions 'hyperwords tool' and 'word2vec implementation' but does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 3: Hyper-parameters selected in our experiments and their candidate parameter sets. hyper-parameter selected value candidate param. set context window (W) 5 {2, 3, 5, 10} sub (t) 10 5 (dirty) {0, 10 5} cds ( ) 3/4 {3/4, 1} post-process e + o {e, o, e + o} initial learning rate ( ) 0.025 {0.01, 0.025, 0.05} # of neg. sampling (k) 5 {1, 5, 10} # of iterations (T) 10 {1, 2, 5, 10, 15} |