Geometric Relationship between Word and Context Representations
Authors: Jiangtao Feng, Xiaoqing Zheng
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our word representations have been evaluated on multiple NLP tasks, and the experimental results show that the proposed model achieved promising results, comparing to several popular word representations. |
| Researcher Affiliation | Academia | Jiangtao Feng, Xiaoqing Zheng School of Computer Science, Fudan University, Shanghai, China Shanghai Key Laboratory of Intelligent Information Processing {fengjt16, zhengxq}@fudan.edu.cn |
| Pseudocode | No | The paper describes algorithms and models using mathematical equations and textual explanations, but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of our model is available at https://github.com/Jiangtao Feng/HGM-MAP. |
| Open Datasets | Yes | English Wikipedia documents were used to train word representation models, and its vocabulary was reduced to 50, 023 by replacing infrequent words with an UNKNOWN token... Word Sim353 (Finkelstein et al. 2002), Sim Lex-999 (Hill, Reichart, and Korhonen 2016) and Stanford Contextual Word Similarity (SCWS) (Huang et al. 2012)... Google (Mikolov et al. 2013a) and MSR dataset (Mikolov, Yih, and Zweig 2013)... For the POS-tagging, we used the Wall Street Journal benchmark (Toutanova et al. 2003)... For the chunking, the Co NLL 2000 shared task5 was used... |
| Dataset Splits | No | The paper lists datasets used for training and evaluation but does not specify explicit training/validation/test dataset splits with percentages, absolute counts, or citations to predefined splits for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions that 'The compared models were trained with the toolkits provided by their authors' and that their own approach is referred to as 'Ours', but it does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks like TensorFlow or PyTorch, or programming language versions). |
| Experiment Setup | Yes | In the training process, the dimensionality of word vectors was set to 300, the window size L to 5, the variance σ2 to 0.5, the number of negative samples to 5, regularization rate to 10 3 and γ in negative sampling to 0.75. Like word2vec, sub-sampling was applied with rate 10 5. The weight coefficient λ was set to 0.24. The stochastic gradient decent was used to minimize the loss function with 0.025 learning rate. All results reported were averaged over ten runs. |