On the Dimensionality of Word Embedding
Authors: Zi Yin, Yuanyuan Shen
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All our experiments use the Text8 corpus [Mahoney, 2011], a standard benchmark corpus used for various natural language tasks. We perform this procedure and cross-validate the results with grid search for LSA, skip-gram Word2Vec and Glo Ve on an English corpus. Figure 1b and 1c display the performances (measured by the correlation between vector cosine similarity and human labels) of word embeddings of various dimensionalities from the PPMI LSA algorithm, evaluated on two word correlation tests: Word Sim353 [Finkelstein et al., 2001] and MTurk771 [Halawi et al., 2012]. |
| Researcher Affiliation | Collaboration | Zi Yin Stanford University s0960974@gmail.com Yuanyuan Shen Microsoft Corp. & Stanford University Yuanyuan.Shen@microsoft.com |
| Pseudocode | No | The paper describes mathematical derivations and concepts but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code can be found on Git Hub: https://github.com/ziyin-dl/word-embedding-dimensionality-selection |
| Open Datasets | Yes | All our experiments use the Text8 corpus [Mahoney, 2011], a standard benchmark corpus used for various natural language tasks. |
| Dataset Splits | No | The paper mentions 'cross-validate the results with grid search' but does not specify explicit training/validation/test splits of its primary Text8 corpus for model training, nor does it detail how a validation set was explicitly used during training. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, epochs) used for training the LSA, skip-gram, or GloVe embeddings. |