Learning Compact Neural Word Embeddings by Parameter Space Sharing

Authors: Jun Suzuki, Masaaki Nagata

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate the trade-off between quality and model size of embedding vectors for several linguistic benchmark datasets, and show that our method can significantly reduce the model size while maintaining the task performance of conventional methods. 4 Experiments
Researcher Affiliation Industry Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237 Japan {suzuki.jun, nagata.masaaki}@lab.ntt.co.jp
Pseudocode Yes Figure 1: Our sequential and iterative parameter update procedure derived from the ADMM framework.
Open Source Code No We used the hyperwords tool6 for data preparation [Levy et al., 2015]. ... We used the word2vec implementation but modified the code to save the context vectors as well as the word vectors7. ... 6https://bitbucket.org/omerlevy/hyperwords 7https://code.google.com/p/word2vec/ (trunc42) The paper refers to third-party tools and a modified version of one, but does not provide a link or clear statement about the availability of their own source code for the proposed PS-SGNS method.
Open Datasets Yes First, our training data was taken from a Wikipedia dump (Aug. 2014). We used the hyperwords tool6 for data preparation [Levy et al., 2015]. Finally, we obtained approximately 1.6 billion tokens of training data D. Table 2: Benchmark datasets used in our experiments.
Dataset Splits No The paper mentions training data and benchmark datasets for evaluation, but it does not provide specific details on how the data was split into training, validation, and test sets, or specify cross-validation settings.
Hardware Specification No The paper does not provide specific hardware details such as CPU or GPU models used for experiments. It mentions 'limited memory devices, such as mobile devices' in the problem description, but this refers to the target application environment, not the experimental setup hardware.
Software Dependencies No The paper mentions 'hyperwords tool' and 'word2vec implementation' but does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes Table 3: Hyper-parameters selected in our experiments and their candidate parameter sets. hyper-parameter selected value candidate param. set context window (W) 5 {2, 3, 5, 10} sub (t) 10 5 (dirty) {0, 10 5} cds ( ) 3/4 {3/4, 1} post-process e + o {e, o, e + o} initial learning rate ( ) 0.025 {0.01, 0.025, 0.05} # of neg. sampling (k) 5 {1, 5, 10} # of iterations (T) 10 {1, 2, 5, 10, 15}