reproducibilityindex.ai

Hypernym Detection Using Strict Partial Order Networks

Authors: Sarthak Dash, Md Faisal Mahbub Chowdhury, Alfio Gliozzo, Nandana Mihindukulasooriya, Nicolas Rodolfo Fauceglia7626-7633

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An extensive evaluation over eleven benchmarks across different tasks shows that SPON consistently either outperforms or attains the state of the art on all but one of these benchmarks. Section 6 and 7 describe the evaluation setup and results.
Researcher Affiliation	Industry	Sarthak Dash, Md Faisal Mahbub Chowdhury, Alﬁo Gliozzo, Nandana Mihindukulasooriya, Nicolas Rodolfo Fauceglia IBM Research AI, Yorktown Heights, NY, USA {sdash, mchowdh, gliozzo}@us.ibm.com {nandana.m, nicolas.fauceglia}@ibm.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information for open-source code for the methodology described.
Open Datasets	Yes	We evaluated our approach on two tasks. The ﬁrst one is hypernym detection where the goal is to classify whether a given pair of terms are in a hypernymy relation. The second task is direction prediction, i.e. to identify which term in a given pair is the hypernym. We use the same datasets, same settings, same evaluation script and same evaluation metrics as Roller, Kiela, and Nickel (2018). Table 1 shows the dataset statistics for unsupervised benchmarks, wherein the split into validation/test folds is already given.1 For detection, Average Precision is reported on 5 datasets, namely BLESS (Baroni and Lenci 2011), LEDS (Baroni et al. 2012), EVAL (Santus et al. 2015), WBLESS (Weeds et al. 2014) and SHWARTZ (Shwartz, Goldberg, and Dagan 2016). While for direction, Average Accuracy is reported on 3 datasets, which are BIBLESS (Kiela et al. 2015), BLESS and WBLESS.
Dataset Splits	Yes	Table 1 shows the dataset statistics for unsupervised benchmarks, wherein the split into validation/test folds is already given. Optimal hyper-parameter conﬁgurations for our proposed SPON model were determined empirically using validation fold for the benchmark datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'Word2Vec model' but does not specify software names with version numbers for reproducibility.
Experiment Setup	Yes	For each experiment, the embedding dimensions d were chosen out of {100, 200, 300, 512, 1024}, whereas the ϵ parameter was chosen out of {10 1, 10 2, 10 3, 10 4}. k is set to 1000 for all experiments. For example, in Table 3, the SPON model used the following hyper-parameters on BLESS dataset: d = 300, ϵ = 0.01. In addition, we used L1 regularization for model weights, and also used dropout with probability of 0.5. Adam optimizer was used with default settings. In addition, the term vectors in our model were initialized uniformly at random, and are constrained to have unit L2 norm during the entire training procedure. Furthermore, an early stopping criterion of 20 epochs was used.