Bayesian Neural Word Embedding
Authors: Oren Barkan
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experimental results that demonstrate the performance of the proposed algorithm for word analogy and similarity tasks on six different datasets and show it is competitive with the original Skip-Gram method. |
| Researcher Affiliation | Collaboration | Oren Barkan Tel Aviv University, Israel Microsoft, Israel |
| Pseudocode | Yes | The algorithm is described in Fig. 1 and includes three main stages. |
| Open Source Code | No | The paper references the word2vec implementation's URL (ŚƚƚƉƐ ĐŽĚĞ ŐŽŽŐůĞ ĐŽŵ Ɖ ǁŽƌĚϮǀĞĐ) which is a third-party tool, but it does not provide a link or explicit statement for the open-sourcing of the authors' own Bayesian Skip-Gram (BSG) methodology code. |
| Open Datasets | Yes | We trained both models on the corpus from (Chelba et al. 2014). |
| Dataset Splits | No | The paper describes the corpus used for training and testing/evaluation datasets. However, it does not specify explicit training/validation/test splits (e.g., percentages or counts for each subset) of the main training corpus, nor does it mention cross-validation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using the "ǁŽƌĚϮǀĞĐ implementation" for SG, but it does not specify any software dependencies with version numbers for either SG or the proposed BSG method. |
| Experiment Setup | Yes | Specifically, we set the target representation dimension m = 40, maximal window size max c = 4, subsampling parameter ρ = 5 10, vocabulary size l = 30000 and negative to positive ratio N = 1. For BSG, we further set τ = 1, κ = 10 and γ = 0.7 (note that BSG is quite robust to the choice of γ as long as 0.5 1 γ < ). Both models were trained for K = 40 iterations (we verified their convergence after ~30 iterations). |