Bayesian Neural Word Embedding

Authors: Oren Barkan

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experimental results that demonstrate the performance of the proposed algorithm for word analogy and similarity tasks on six different datasets and show it is competitive with the original Skip-Gram method.
Researcher Affiliation Collaboration Oren Barkan Tel Aviv University, Israel Microsoft, Israel
Pseudocode Yes The algorithm is described in Fig. 1 and includes three main stages.
Open Source Code No The paper references the word2vec implementation's URL (ŚƚƚƉƐ ĐŽĚĞ ŐŽŽŐůĞ ĐŽŵ Ɖ ǁŽƌĚϮǀĞĐ) which is a third-party tool, but it does not provide a link or explicit statement for the open-sourcing of the authors' own Bayesian Skip-Gram (BSG) methodology code.
Open Datasets Yes We trained both models on the corpus from (Chelba et al. 2014).
Dataset Splits No The paper describes the corpus used for training and testing/evaluation datasets. However, it does not specify explicit training/validation/test splits (e.g., percentages or counts for each subset) of the main training corpus, nor does it mention cross-validation.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions using the "ǁŽƌĚϮǀĞĐ implementation" for SG, but it does not specify any software dependencies with version numbers for either SG or the proposed BSG method.
Experiment Setup Yes Specifically, we set the target representation dimension m = 40, maximal window size max c = 4, subsampling parameter ρ = 5 10, vocabulary size l = 30000 and negative to positive ratio N = 1. For BSG, we further set τ = 1, κ = 10 and γ = 0.7 (note that BSG is quite robust to the choice of γ as long as 0.5 1 γ < ). Both models were trained for K = 40 iterations (we verified their convergence after ~30 iterations).