An Adaptive Hierarchical Compositional Model for Phrase Embedding

Authors: Bing Li, Xiaochun Yang, Bin Wang, Wei Wang, Wei Cui, Xianchao Zhang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental evaluation demonstrates that our model outperforms state-of-the-art methods in both similarity tasks and analogy tasks.
Researcher Affiliation Academia School of Computer Science and Engineering, Northeastern University, China University of New South Wales, Australia Dongguan University of Technology, China College of Electrical Engineering and Automation, Shandong University of Science and Technology + School of Software, Dalian University of Technology, China
Pseudocode No The paper describes algorithmic steps using text and equations, but does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We used the following three training corpora: Text8 (https://cs.fit.edu/ mmahoney/compression/textdata.html), Google News (http://www.statmt.org/wmt14/training-monolingual-newscrawl/news.2012.en.shuffled.gz), Wiki (https://dumps.wikimedia.org/).
Dataset Splits No The paper describes hyperparameters and training iterations but does not provide specific training/validation/test dataset splits for its main training corpora (Text8, Google News, Wiki). It uses separate evaluation datasets.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions using Skip-Gram architecture and Negative Sampling technique but does not provide specific ancillary software details with version numbers.
Experiment Setup Yes To be specific, we set the number of negative examples to be 25, and iterations (number of epochs) to be 5. The initial learning rate of Skip Gram model were set to 0.05. We set the dimension of vector d = 200, unless noted otherwise. We set context window length to be 10 and sub-sampling rate 1e 5... Thus, we randomly initialized β by a Gaussian distribution N(0.5, 1).