Spectral Word Embedding with Negative Sampling
Authors: Behrouz Haji Soleimani, Stan Matwin
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that negative sampling can boost the quality of spectral methods. Our algorithm provides results as good as the state-of-the-art but in a much faster and efficient way. |
| Researcher Affiliation | Academia | Behrouz Haji Soleimani, Stan Matwin Faculty of Computer Science, Dalhousie University Institute for Big Data Analytics 6050 University Avenue, Halifax, NS, Canada behrouz.hajisoleimani@dal.ca, st837183@dal.ca |
| Pseudocode | No | The paper describes the steps of its algorithm in paragraph form (e.g., 'Our approach at a glance builds...'), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | The source code of our algorithm is available at: https://github.com/behrouzhs/svdns. |
| Open Datasets | Yes | For the training of models, we have used English Wikipedia dump of March 05, 2016. |
| Dataset Splits | No | The paper mentions training on Wikipedia and evaluating on other datasets (word similarity, analogy) but does not specify a separate validation split of the training data. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU models, GPU models, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | The dimensionality of embeddings is 100 in all our experiments. ...GloVe is trained with its recommended parameter setting (i.e. xmax = 100), CBOW and SGNS are trained with negative sampling set to 5. Our proposed algorithm, SVD-NS, is trained with α = 2.5. |