Large-Scale Analogical Reasoning

Authors: Vinay Chaudhri, Stijn Heymans, Adam Overholtzer, Aaron Spaulding, Michael Wessel

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present numerous examples of answers produced by the system and empirical data on answer quality to illustrate that we have addressed many of the problems of the previous system. Our evaluation goal was to test the cognitive validity of the techniques considered here. More specifically, we test if we can we capture the salient similarities and differences and rank them in an order that matches the user s understanding. To test this hypothesis, we assembled a suite of 158 comparison questions uniformly spread over the first eleven chapters of the textbook. Each answer was rated by a biologist who encoded the knowledge and a biology teacher. Overall, 97% of the questions produced an answer. From this set, 57% of the questions were considered of very high quality with no major issues.
Researcher Affiliation Industry Vinay K. Chaudhri, Stijn Heymans, Aaron Spaulding, Adam Overholtzer, Michael Wessel Artificial Intelligence Center, SRI International Menlo Park, CA 94025, USA
Pseudocode No The paper describes algorithms in text but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper provides links to sample answer outputs ('www.ai.sri.com/ halo/public/2014-aaai/') but no statement or link to the source code for the described methodology.
Open Datasets No The paper states, 'We relied on a KB called KB Bio 101, which was created from a biology textbook by domain experts using a state-of-the-art knowledge-authoring system called AURA (Gunning et al. 2010).' However, it does not provide concrete access information (e.g., link, DOI, repository) for this KB or any other dataset used.
Dataset Splits No The paper mentions 'we assembled a suite of 158 comparison questions' for evaluation but does not specify any training, validation, or testing splits of this data, nor does it describe cross-validation or predefined splits.
Hardware Specification No The paper does not mention any specific hardware (e.g., GPU, CPU models, or cloud computing instances) used for running the experiments.
Software Dependencies No The paper mentions systems like AURA, SME, and ACME but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We devised three different criteria to determine the interestingness of comparing a pair of slot values. The first criterion is based on determining the interestingness of a comparison. We assign an interestingness score to each individual value and then compare an overall score by comparing two values to each other. The interestingness score of a value v, is a number i(v), 0 i(v) 1, which is computed by using the following heuristics... We compute the overall score by taking an average of score1(v, u), score2(v, u) and score3(v, u). Using this scoring function, we can define a score of a particular alignment of values as the sum of the individual alignments. Finding the best alignment is an optimization problem, and we solve it using a best-first search (Russell et al. 1995).