reproducibilityindex.ai

On the Dimensionality of Word Embedding

Authors: Zi Yin, Yuanyuan Shen

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	All our experiments use the Text8 corpus [Mahoney, 2011], a standard benchmark corpus used for various natural language tasks. We perform this procedure and cross-validate the results with grid search for LSA, skip-gram Word2Vec and Glo Ve on an English corpus. Figure 1b and 1c display the performances (measured by the correlation between vector cosine similarity and human labels) of word embeddings of various dimensionalities from the PPMI LSA algorithm, evaluated on two word correlation tests: Word Sim353 [Finkelstein et al., 2001] and MTurk771 [Halawi et al., 2012].
Researcher Affiliation	Collaboration	Zi Yin Stanford University s0960974@gmail.com Yuanyuan Shen Microsoft Corp. & Stanford University Yuanyuan.Shen@microsoft.com
Pseudocode	No	The paper describes mathematical derivations and concepts but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code can be found on Git Hub: https://github.com/ziyin-dl/word-embedding-dimensionality-selection
Open Datasets	Yes	All our experiments use the Text8 corpus [Mahoney, 2011], a standard benchmark corpus used for various natural language tasks.
Dataset Splits	No	The paper mentions 'cross-validate the results with grid search' but does not specify explicit training/validation/test splits of its primary Text8 corpus for model training, nor does it detail how a validation set was explicitly used during training.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	The paper does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, epochs) used for training the LSA, skip-gram, or GloVe embeddings.