Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality

Authors: Hiromasa Arai, Crystal Maung, Ke Xu, Haim Schweitzer

AAAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented and tested the three CSSP-WA* algorithms that were described in Section 3. The experiments described here compare our algorithms to the current state-of-the-art on several large publicly available datasets.
Researcher Affiliation	Academia	Hiromasa Arai Dept. of Computer Science University of Texas (Dallas) 800 W Campbell Road Richardson, TX 75080 EMAIL Crystal Maung Dept. of Computer Science University of Texas (Dallas) 800 W Campbell Road Richardson, TX 75080 EMAIL Ke Xu Dept. of Computer Science University of Texas (Dallas) 800 W Campbell Road Richardson, TX 75080 EMAIL Haim Schweitzer Dept. of Computer Science University of Texas (Dallas) 800 W Campbell Road Richardson, TX 75080 EMAIL
Pseudocode	Yes	Figure 1: Example of the subsets graph and the generic heuristic search algorithm. The algorithm maintains the fringe list F and the closed nodes list C. Several choices of f (ni) are discussed in the text. 0. Put the root node into F. 1. While F is nonempty and no solution found: 1.1 Pick ni with the smallest f (ni) from F. 1.2 If ni has k columns return it as the solution. 1.3 Otherwise: 1.3.1 Add ni to C. 1.3.2 Examine all children nj of ni. 1.3.2.1 If nj is in C or F do nothing. 1.3.2.2 Otherwise put nj in F.
Open Source Code	No	The paper states 'We implemented and tested the three CSSP-WA* algorithms' but does not provide any link to source code or explicitly state that the code is publicly available.
Open Datasets	Yes	The datasets are shown in the following table: Name Size availability Madelon 2, 000 500 UCI Isolet5 1, 559 618 UCI CNAE-9 1, 080 857 UCI Tech TC01 163 29, 261 Technion Day1 20, 000 3, 231, 957 UCI
Dataset Splits	No	The paper lists datasets used (Madelon, Isolet5, CNAE-9, Tech TC01, Day1) but does not provide specific information on how these datasets were split into training, validation, or test sets for reproduction, nor does it refer to standard splits with citations.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory, or specific computing environments).
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup	No	The paper states 'The WA* algorithms use ϵ = 0.5 on Madelon, CNAE-9 and Tech TC01, and ϵ = 0.9 on the other datasets' which is a hyperparameter, but it does not provide comprehensive details on the experimental setup such as other specific hyperparameter values, model initialization, optimizer settings, or training schedules.