Geometry of Compositionality

Authors: Hongyu Gong, Suma Bhat, Pramod Viswanath

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed method is competitive with state of the art and displays high accuracy in context-specific compositionality detection of a variety of natural language phenomena (idiomaticity, sarcasm, metaphor) for different datasets in multiple languages.
Researcher Affiliation Academia Hongyu Gong, Suma Bhat, Pramod Viswanath hgong6@illinois.edu, spbhat2@illinois.edu, pramodv@illinois.edu Department of Electrical and Computer Engineering University of Illinois at Urbana Champaign, USA
Pseudocode No The paper describes the algorithmic steps, such as PCA and projection, but does not provide a formal pseudocode block or algorithm listing.
Open Source Code Yes 1available at: https://github.com/HongyuGong/Geometry-of-Compositionality.git
Open Datasets Yes We construct 2 datasets 1 (one for English and the other for Chinese) consisting of a list of polysemous phrases and their respective contexts (compositional and non-compositional). ... 1available at: https://github.com/HongyuGong/Geometry-of-Compositionality.git ... The training corpus of embeddings in English, Chinese and German are obtained from polyglot (Al-Rfou, Perozzi, and Skiena 2013). ... Dataset: The English noun compounds dataset (ENC), has 90 English noun compounds annotated on a continuous [0, 5] scale for the phrase and component-wise compositionality (Reddy, Mc Carthy, and Manandhar 2011); the English verb particle constructions (EVPC) contains 160 English verb-particle compounds, whose componentwise compositionality are annotated on a binary scale (Bannard 2006). German noun compounds (GNC), which contains 246 German noun compounds annotated on a continuous [1,7] scale for phrase and component compositionality (Schulte im Walde, M uller, and Roller 2013).
Dataset Splits No The paper mentions using a 'training set' for parameter tuning and refers to several datasets and their sizes, but it does not explicitly provide specific train/validation/test split percentages, sample counts, or details on cross-validation setups for all experiments required for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, memory, or specific cloud instance types.
Software Dependencies No The paper mentions software like 'polyglot', 'word2vec', 'MSSG', and 'random forest' and cites papers related to them, but it does not specify version numbers for these software dependencies, which is crucial for reproducibility.
Experiment Setup Yes Our compositionality prediction algorithm uses only two hyperparameters: variance ratio (used to decide the amount of variance PCA should capture) and threshold (used to test if the compositionality score is above or below this value). ... In our experiments, d = 200, n ~ 10-20, and m ~ 3. ... variance ratio equal to about 0.6 generally achieves good performance. ... We set the same threshold of 2.5 to ENC as in (Salehi, Cook, and Baldwin 2014a), a threshold of 4 to GNC and use the binary labels of EVPC.