Hypernym Detection Using Strict Partial Order Networks
Authors: Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Gliozzo, Nandana Mihindukulasooriya, Nicolas Rodolfo Fauceglia7626-7633
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An extensive evaluation over eleven benchmarks across different tasks shows that SPON consistently either outperforms or attains the state of the art on all but one of these benchmarks. Section 6 and 7 describe the evaluation setup and results. |
| Researcher Affiliation | Industry | Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Gliozzo, Nandana Mihindukulasooriya, Nicolas Rodolfo Fauceglia IBM Research AI, Yorktown Heights, NY, USA {sdash, mchowdh, gliozzo}@us.ibm.com {nandana.m, nicolas.fauceglia}@ibm.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information for open-source code for the methodology described. |
| Open Datasets | Yes | We evaluated our approach on two tasks. The first one is hypernym detection where the goal is to classify whether a given pair of terms are in a hypernymy relation. The second task is direction prediction, i.e. to identify which term in a given pair is the hypernym. We use the same datasets, same settings, same evaluation script and same evaluation metrics as Roller, Kiela, and Nickel (2018). Table 1 shows the dataset statistics for unsupervised benchmarks, wherein the split into validation/test folds is already given.1 For detection, Average Precision is reported on 5 datasets, namely BLESS (Baroni and Lenci 2011), LEDS (Baroni et al. 2012), EVAL (Santus et al. 2015), WBLESS (Weeds et al. 2014) and SHWARTZ (Shwartz, Goldberg, and Dagan 2016). While for direction, Average Accuracy is reported on 3 datasets, which are BIBLESS (Kiela et al. 2015), BLESS and WBLESS. |
| Dataset Splits | Yes | Table 1 shows the dataset statistics for unsupervised benchmarks, wherein the split into validation/test folds is already given. Optimal hyper-parameter configurations for our proposed SPON model were determined empirically using validation fold for the benchmark datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Word2Vec model' but does not specify software names with version numbers for reproducibility. |
| Experiment Setup | Yes | For each experiment, the embedding dimensions d were chosen out of {100, 200, 300, 512, 1024}, whereas the ϵ parameter was chosen out of {10 1, 10 2, 10 3, 10 4}. k is set to 1000 for all experiments. For example, in Table 3, the SPON model used the following hyper-parameters on BLESS dataset: d = 300, ϵ = 0.01. In addition, we used L1 regularization for model weights, and also used dropout with probability of 0.5. Adam optimizer was used with default settings. In addition, the term vectors in our model were initialized uniformly at random, and are constrained to have unit L2 norm during the entire training procedure. Furthermore, an early stopping criterion of 20 epochs was used. |