reproducibilityindex.ai

Geometry of Polysemy

Authors: Jiaqi Mu, Suma Bhat, Pramod Viswanath

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Apart from several prototypical target (word,sense) examples and a host of empirical studies to intuit and justify the various geometric representations, we validate our algorithms on standard sense induction and disambiguation datasets and present new state-of-the-art results.As a quantitative demonstration of the latent geometry captured by our methods, we evaluate the proposed induction algorithm on standard Word Sense Induction (WSI) tasks. Our algorithm outperforms state-of-the-art on two datasets: (a) Sem Eval-2010 Task 14 (Manandhar et al., 2010) whose word senses are obtained from Onto Notes (Hovy et al., 2006); and (b) a custom-built dataset built by repurposing the polysemous dataset of (Arora et al., 2016b).
Researcher Affiliation	Academia	Jiaqi Mu, Suma Bhat, Pramod Viswanath Department of Electrical and Computer Engineering University of Illinois at Urbana Champaign Urbana, IL 61801, USA {jiaqimu2,spbhat2,pramodv}@illinois.edu
Pseudocode	Yes	The pseudocode for context representation (c.f. Section 2.1) is provided in Algorithm 3.The pseudocode for word sense induction (c.f. Section 2.2) is provided in Algorithm 4.The pseudocode for the word sense disambiguation (c.f. Section 2.2) is provided in Algorithm 5.
Open Source Code	No	The paper does not provide any explicit links to source code for the methodology or state that the code is publicly available.
Open Datasets	Yes	All our algorithms are unsupervised and operate on a large corpus obtained from Wikipedia dated 09/15. We use Wiki Extractor (http://medialab.di.unipi.it/wiki/ Wikipedia_Extractor) to extract the plain text.We test K-Grassmeans on two datasets a standard one from the test set of Sem Eval-2010 (Manandhar et al., 2010) and a custom-built Make-Sense-2016. Appendix G gives a detailed description about the two datasets.
Dataset Splits	No	The paper mentions 'validation' in the context of validating hypotheses but does not specify validation dataset splits (e.g., percentages or counts) for reproduction of experiments. The evaluation sections focus on test sets.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	We use the skip-gram model from word2vec Mikolov et al. (2013) as the word embedding algorithm where we use the default parameter setting. We use Wiki Extractor (http://medialab.di.unipi.it/wiki/ Wikipedia_Extractor) to extract the plain text.The paper mentions software like `word2vec` and `Wiki Extractor` but does not provide specific version numbers for them or any other relevant libraries or solvers required for reproducibility.
Experiment Setup	Yes	We set c = 10 as the context window size and set N = 3 as the rank of PCA. We choose K = 2 and K = 5 in our experiment. For the disambiguation algorithm, we set θ = 0.6.