Global Distant Supervision for Relation Extraction

Authors: Xianpei Han, Le Sun

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches. We test our model on a publicly available data set. Experimental results show that our method significantly outperforms traditional DS methods.
Researcher Affiliation Academia State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences {xianpei, sunle}@nfs.iscas.ac.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using 'Stanford’s MIMLRE package (Surdeanu et al., 2012), which is open source' for baselines, but does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We evaluate our method on a publicly available data set KBP, which was developed by Surdeanu et al. (2012).
Dataset Splits Yes This paper tunes and tests different methods use the same partitions and the same evaluation method as Surdeanu et al. (2012). We tune our global distant supervision model using the validation partition of KBP.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions algorithms used (PSCG, Sample SAT, Max Walk SAT) and a third-party package (Stanford’s MIMLRE) but does not provide specific ancillary software details with version numbers.
Experiment Setup Yes We tune our global distant supervision model using the validation partition of KBP. After tuning for different MLN models, we used PSCG algorithm (5 samples, 10~20 iterations, step length 0.03) and Sample SAT inference algorithm (5,000,000 ~ 10,000,000 flips with 20% noise flips, 30% random ascent flips, and 50% SA flips) for learning. Because positive/negative instances are highly imbalanced in the training corpus, we put a higher misclassification cost (the tuned value is 2.0) to positive instances. For the KNNOf evidence predicates, we use 10 nearest neighbors for each instance (with similarity > 0.2).