Numerical Relation Extraction with Minimal Supervision
Authors: Aman Madaan, Ashish Mittal, . Mausam, Ganesh Ramakrishnan, Sunita Sarawagi
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our extractors on the task of extracting numerical indicators (e.g., inflation rate) for countries. We compile a knowledge-base using geopolitical data from World Bank and learn extractors for ten numerical relations. We find that Number Tron obtains a much higher recall at a slightly higher precision as compared to Number Rule. Both systems massively outperform Multi R model (and its simple extensions) obtaining 17 25 point F-score improvements. The paper includes sections such as 'Experiments', 'Comparison of different methods', and 'Ablation Study for Number Tron', which detail empirical evaluations, dataset usage, and performance metrics. |
| Researcher Affiliation | Collaboration | Aman Madaan Visa Inc. amadaan@visa.com Ashish Mittal IBM Research ashishmittal@in.ibm.com Mausam IIT Delhi mausam@cse.iitd.ac.in Ganesh Ramakrishnan IIT Bombay ganesh@cse.iitb.ac.in Sunita Sarawagi IIT Bombay sunita@cse.iitb.ac.in |
| Pseudocode | No | The paper describes the algorithms for Number Rule and Number Tron in detail using narrative text. However, it does not include any formal pseudocode blocks, algorithm listings, or structured steps labeled as such. |
| Open Source Code | Yes | We release our code1 and other resources for further research. 1Available at http://www.github.com/NEO-IE |
| Open Datasets | Yes | Training Corpus We train on the TAC KBP 2014 corpus (TACKBP 2014) comprising roughly 3 million documents from News Wire, discussion forums, and the Web. Knowledge Base We compile our KB4 from data.worldbank.org. This data has 1,281 numerical indicators for 249 countries, with over 4 million base facts. Our experiments are on ten of these relations listed in Table 2. 4Available at https://github.com/NEO-IE/numrelkb |
| Dataset Splits | No | The paper mentions 'We use cross validation to set α = 0.90' and that 'δr% (set to 20%, obtained via cross validation)' was determined. It defines a 'Training Corpus' and a 'Test Set'. However, it does not provide specific details on how the training data was split for validation, such as percentages, sample counts, or the methodology for creating such splits beyond the general mention of cross-validation. |
| Hardware Specification | No | The paper describes the computational methods and evaluation but does not provide any specific details about the hardware (e.g., CPU, GPU, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Stanford parser (Manning et al. 2014)' and 'the unit tagger (Sarawagi and Chakrabarti 2014)', and a 'Multi R' baseline downloaded from GitHub with a specific commit ID. While these indicate software used, the paper does not specify version numbers for these tools or any other key software dependencies (e.g., Python, TensorFlow, PyTorch, or specific libraries) that would enable exact replication. |
| Experiment Setup | Yes | We set an nr q to 1 if q is within δr% (set to 20%, obtained via cross validation) of v for some triple (e, r, v) in KBe,u and one of the pre-specified keywords of r appears in any of the sentences containing q... We use cross validation to set α = 0.90... Atleast-K: ˆnr q is set to one iff at least k fraction of s Se,q have ˆzs = r. We use k = 0.5 for our experiments. |