reproducibilityindex.ai

Classification with Strategically Withheld Data

Authors: Anilesh K. Krishnaswamy, Haoming Li, David Rein, Hanrui Zhang, Vincent Conitzer5514-5522

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on several real-world data sets, and present insights into their relative performance in different settings.
Researcher Affiliation	Academia	Duke University, University of Southern California
Pseudocode	Yes	Algorithm 1 HILL-CLIMBING (HC) Classiﬁer; Algorithm 2 Incentive-Compatible Logistic Regression
Open Source Code	No	The paper states: "A version of the paper including the Supplement is available at https://arxiv.org/abs/2012.10203". This link points to the paper itself, not to source code for the methodology.
Open Datasets	Yes	Four credit approval datasets are obtained from the UCI repository (Dua and Graff 2017), one each from Australia, Germany, Poland and Taiwan.
Dataset Splits	No	The paper mentions training data and test accuracy, but it does not specify explicit train/validation/test dataset splits, percentages, or the methodology for partitioning the data into these sets beyond 'random undersampling' for balancing.
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its experiments.
Software Dependencies	No	The paper mentions types of models like 'logistic regression' and 'neural networks' and concepts like 'Projected Gradient Descent', but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that would be needed for reproducibility.
Experiment Setup	Yes	In our experiments, we run HC using 10 4, and convergence is achieved pretty quickly (see the Supplement for exact details). ... we randomly remove a fraction ϵ = 0, 0.1, . . . , 0.5 of all feature values in each dataset to simulate data that is missing naturally.