reproducibilityindex.ai

Word Embedding as Maximum A Posteriori Estimation

Authors: Shoaib Jameel, Zihao Fu, Bei Shi, Wai Lam, Steven Schockaert6562-6569

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present a series of experiments in which we compare our model with popular and recent state-ofthe-art word embedding and topic models. Experiments in this work were performed using the ICARUS computational facility from Information Services and the School of Computing Hydra Cluster at the University of Kent.
Researcher Affiliation	Collaboration	Shoaib Jameel,1 Zihao Fu,2 Bei Shi,3 Wai Lam,2 Steven Schockaert4 1School of Computing, Medway Campus, University of Kent, UK 2Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong 3Tencent AI Lab, Shenzhen, China 4School of Computer Science and Informatics, Cardiff University, UK
Pseudocode	No	The paper describes the model and its optimization using mathematical equations but does not provide pseudocode or algorithm blocks.
Open Source Code	Yes	We share our code, pre-processing scripts and datasets online14. 14https://bit.ly/2J5Mt Xj
Open Datasets	Yes	Corpora: We have considered the May 2018 dump of the English Wikipedia. First, we considered three analogy datasets: the Google Word Analogy dataset5, the Microsoft Research Syntactic Analogies Dataset (MSR)6, and the BATS 3.0 dataset7.
Dataset Splits	Yes	For datasets that have pre-deﬁned tuning and testing splits, we used these standard splits. For the other datasets, we randomly selected 20% as tuning data, and we report results on the remaining 80%.
Hardware Specification	Yes	Experiments in this work were performed using the ICARUS computational facility from Information Services and the School of Computing Hydra Cluster at the University of Kent. This experiment was performed on 3.20 GHz machine with 25 threads.
Software Dependencies	No	The paper mentions several software tools and implementations used (e.g., Keras, Elasticsearch, VSMLib), but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The number of dimensions for each model was selected from {50, 100, 300, 400}. For CBOW and SG, we chose the number of negative samples from a pool of {1, 5, 10, 15}. For Glo Ve, we selected the xmax value from {10, 50, 100} and α from {0.1, 0.25, 0.5, 0.75, 1}. The number of iterations for all word embedding models was ﬁxed to 20 and the number of posterior inference iterations for all topic models was ﬁxed to 1000. We also experimented with different learning rate parameters, namely {0.01, 0.001, 0.0001, 0.00001}.