Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Dataset Complexity Measure for Analogical Transfer

Authors: Fadi Badra

IJCAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Three experiments were run. The ﬁrst one tests the hypothesis that the complexity measure Γ is an indicator of the quality of the similarity measure σS. The second one evaluates the performance of Co AT on a regression task, and the third one evaluates the performance of Co AT on classiﬁcation tasks.
Researcher Affiliation	Academia	Fadi Badra Universit e Sorbonne Paris Nord, Laboratoire d Informatique M edicale et d Ing enierie des Connaissances en e-Sant e LIMICS, INSERM, UMR 1142, F-93000, Bobigny, France EMAIL
Pseudocode	Yes	Algorithm 1 Optimize weights in a weighted sum" and "Algorithm 2 Complexity-based analogical transfer" are present.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	The PIMA indian diabetes dataset2 includes data about 768 native American women. The prediction task consists in predicting if a person suffers from diabetes (taken as a binary class) from the value of 8 continuous attributes. 2https://kaggle.com/uciml/pima-indians-diabetes-database" and "The Automobile dataset4 includes data about 205 automobiles, from which were kept only the 159 instances that contain no missing values. 4https://archive.ics.uci.edu/ml/datasets/Automobile" and "6 classical datasets of the UCI repository5 (Tab. 2): the Monks datasets (monks1, monks2, and monks3), the User Modeling dataset (user), the Iris dataset (iris), and the Zoo dataset (zoo). 5https://archive.ics.uci.edu/ml/
Dataset Splits	Yes	The quality of each similarity scale is estimated by the accuracy of the k-Nearest Neighbor (k-NN) algorithm, with k = 5, computed using 10-fold cross validation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software components, libraries, or solvers used in the experiments.
Experiment Setup	Yes	The quality of each similarity scale is estimated by the accuracy of the k-Nearest Neighbor (k-NN) algorithm, with k = 5, computed using 10-fold cross validation." Also, details about similarity measure construction and weight optimization are provided: "We choose σR = σprice p2,1000 and the similarity measure σS is assumed to be a weighted sum of the similarity σnb rooms p2,6 according to the number of rooms and the similarity σarea = according to the location area: σS(uv) = w σnb rooms p2,6 (uv) + (1 w) σarea = (uv)" and "The feature scale σϕ = was used for each binary feature ϕ, and a polynomial scale was used for each continuous feature. The weights are set by the method proposed in Sec. 4."