reproducibilityindex.ai

Large-Scale Analogical Reasoning

Authors: Vinay Chaudhri, Stijn Heymans, Adam Overholtzer, Aaron Spaulding, Michael Wessel

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present numerous examples of answers produced by the system and empirical data on answer quality to illustrate that we have addressed many of the problems of the previous system. Our evaluation goal was to test the cognitive validity of the techniques considered here. More speciﬁcally, we test if we can we capture the salient similarities and differences and rank them in an order that matches the user s understanding. To test this hypothesis, we assembled a suite of 158 comparison questions uniformly spread over the ﬁrst eleven chapters of the textbook. Each answer was rated by a biologist who encoded the knowledge and a biology teacher. Overall, 97% of the questions produced an answer. From this set, 57% of the questions were considered of very high quality with no major issues.
Researcher Affiliation	Industry	Vinay K. Chaudhri, Stijn Heymans, Aaron Spaulding, Adam Overholtzer, Michael Wessel Artiﬁcial Intelligence Center, SRI International Menlo Park, CA 94025, USA
Pseudocode	No	The paper describes algorithms in text but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper provides links to sample answer outputs ('www.ai.sri.com/ halo/public/2014-aaai/') but no statement or link to the source code for the described methodology.
Open Datasets	No	The paper states, 'We relied on a KB called KB Bio 101, which was created from a biology textbook by domain experts using a state-of-the-art knowledge-authoring system called AURA (Gunning et al. 2010).' However, it does not provide concrete access information (e.g., link, DOI, repository) for this KB or any other dataset used.
Dataset Splits	No	The paper mentions 'we assembled a suite of 158 comparison questions' for evaluation but does not specify any training, validation, or testing splits of this data, nor does it describe cross-validation or predefined splits.
Hardware Specification	No	The paper does not mention any specific hardware (e.g., GPU, CPU models, or cloud computing instances) used for running the experiments.
Software Dependencies	No	The paper mentions systems like AURA, SME, and ACME but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We devised three different criteria to determine the interestingness of comparing a pair of slot values. The first criterion is based on determining the interestingness of a comparison. We assign an interestingness score to each individual value and then compare an overall score by comparing two values to each other. The interestingness score of a value v, is a number i(v), 0 i(v) 1, which is computed by using the following heuristics... We compute the overall score by taking an average of score1(v, u), score2(v, u) and score3(v, u). Using this scoring function, we can deﬁne a score of a particular alignment of values as the sum of the individual alignments. Finding the best alignment is an optimization problem, and we solve it using a best-ﬁrst search (Russell et al. 1995).