Acquiring Comparative Commonsense Knowledge from the Web
Authors: Niket Tandon, Gerard Melo, Gerhard Weikum
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our model outperforms strong baselines and allows us to obtain a large knowledge base of disambiguated commonsense assertions. |
| Researcher Affiliation | Academia | Niket Tandon Max Planck Institute for Informatics Saarbr ucken, Germany ntandon@mpi-inf.mpg.de Gerard de Melo IIIS,Tsinghua University Beijing, China demelo@tsinghua.edu.cn Gerhard Weikum Max Planck Institute for Informatics Saarbr ucken, Germany weikum@mpi-inf.mpg.de |
| Pseudocode | No | The paper presents the Joint Model using integer-linear programs (ILPs) and lists constraints in Table 1, but it does not provide formal pseudocode or an algorithm block. |
| Open Source Code | No | The resulting knowledge base is freely available from http://resources.mpi-inf.mpg.de/yago-naga/webchild/. |
| Open Datasets | Yes | Corpora. We ran our extraction system on the following two very large Web corpora. Clue Web09: The Clue Web09 data set1 is a large multilingual set of Web pages crawled from the Web in 2009. We used the 504 million Web pages in the English portion. 1http://lemurproject.org/clueweb09/ Clue Web12: The Clue Web12 data set2 consists of 27 TB of data from 733 million English Web pages crawled from the Web in 2012. 2http://lemurproject.org/clueweb12/ |
| Dataset Splits | No | To evaluate our system, we created three test sets sampling three different kinds of triples from this raw, ambiguous data: ... Each of these three sample sets contained 100 randomly chosen observations. Finally, we additionally relied on a separate set of around 40 annotated observations used for development and tuning, in order to avoid experimenting with different variants of our model on the test set. |
| Hardware Specification | No | Our implementation is based on Hadoop Map Reduce in order to quickly process large Web corpora in a distributed hardware cluster. |
| Software Dependencies | Yes | For optimization, we use the Gurobi Optimizer package (Optimization 2014). |
| Experiment Setup | Yes | We greedily chose the most similar observed triples up to a maximal size of 10 observed triples, and then for every observed triple, possible candidate groundings were considered. We used these to instantiate the ILP, but smartly pruned out unnecessary variables (removing Bij,kl variables when simij,kl is zero or near-zero). |