Acquiring Comparative Commonsense Knowledge from the Web

Authors: Niket Tandon, Gerard Melo, Gerhard Weikum

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our model outperforms strong baselines and allows us to obtain a large knowledge base of disambiguated commonsense assertions.
Researcher Affiliation Academia Niket Tandon Max Planck Institute for Informatics Saarbr ucken, Germany ntandon@mpi-inf.mpg.de Gerard de Melo IIIS,Tsinghua University Beijing, China demelo@tsinghua.edu.cn Gerhard Weikum Max Planck Institute for Informatics Saarbr ucken, Germany weikum@mpi-inf.mpg.de
Pseudocode No The paper presents the Joint Model using integer-linear programs (ILPs) and lists constraints in Table 1, but it does not provide formal pseudocode or an algorithm block.
Open Source Code No The resulting knowledge base is freely available from http://resources.mpi-inf.mpg.de/yago-naga/webchild/.
Open Datasets Yes Corpora. We ran our extraction system on the following two very large Web corpora. Clue Web09: The Clue Web09 data set1 is a large multilingual set of Web pages crawled from the Web in 2009. We used the 504 million Web pages in the English portion. 1http://lemurproject.org/clueweb09/ Clue Web12: The Clue Web12 data set2 consists of 27 TB of data from 733 million English Web pages crawled from the Web in 2012. 2http://lemurproject.org/clueweb12/
Dataset Splits No To evaluate our system, we created three test sets sampling three different kinds of triples from this raw, ambiguous data: ... Each of these three sample sets contained 100 randomly chosen observations. Finally, we additionally relied on a separate set of around 40 annotated observations used for development and tuning, in order to avoid experimenting with different variants of our model on the test set.
Hardware Specification No Our implementation is based on Hadoop Map Reduce in order to quickly process large Web corpora in a distributed hardware cluster.
Software Dependencies Yes For optimization, we use the Gurobi Optimizer package (Optimization 2014).
Experiment Setup Yes We greedily chose the most similar observed triples up to a maximal size of 10 observed triples, and then for every observed triple, possible candidate groundings were considered. We used these to instantiate the ILP, but smartly pruned out unnecessary variables (removing Bij,kl variables when simij,kl is zero or near-zero).