Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Context-Aware Feature Selection and Classification

Authors: Juanyan Wang, Mustafa Bilgic

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several datasets demonstrate that the proposed model outperforms eight baselines on a combined classification and feature selection measure, and is able to better emulate the ground-truth instancelevel feature selections. We conduct experiments to compare the proposed CFSC method to several baselines on both classification and feature selection performance.
Researcher Affiliation	Academia	Juanyan Wang and Mustafa Bilgic Illinois Institute of Technology, Chicago, IL, USA EMAIL, EMAIL
Pseudocode	No	The paper describes its model architecture in text and with a diagram (Figure 1), but it does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The supplementary materials are available at https://github.com/IIT-ML/ IJCAI23-CFSC.
Open Datasets	Yes	The Credit [Goyal, 2020] dataset contains 3,254 bank credit card customers with 37 features and binary labels indicating if the customer is an Attrited Customer. The Company [Zieba et al., 2016] dataset has 4,182 companies with 64 features and binary labels indicating whether the company bankrupted within the forecasting period. The Mobile [Sharma, 2017] dataset contains 2,000 mobile phone data with 20 features and binary labels indicating if the price of a phone is in the high cost range. The NHIS [CDC, 2017] dataset has 2,306 adult survey data with 144 features and binary labels indicating if the person is suffering from chronic obstructive pulmonary disease. The Ride [City of Chicago, 2019] dataset has 4,800 ride trip records with 46 features and binary labels indicating if the trip is shared with other persons.
Dataset Splits	Yes	For each dataset, we use 1/3 of the data as the test set and perform 5-fold validation on the rest of the data where one fold is used for validation and four folds are used for training.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory specifications, or cloud instance types) used to run its experiments.
Software Dependencies	No	The paper mentions various algorithms and activation functions such as 'sparsemax' and 'gumbel-softmax activation function', but it does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python version, TensorFlow/PyTorch versions, or specific library versions for baselines).
Experiment Setup	Yes	CFSC has one hidden layer with 16 units for the classification module and two hidden layers with 64 and 256 units respectively for the feature selection module. The ATT-FL model has one hidden layer with 64, one Bi LSTM layer with 32, and one attention layer with 256 units. The RNP model has one hidden layer with 16 units for the classification module and two hidden layers with 64 and 256 units respectively for the feature selection module. The FF model has one hidden layer with 16 units. We set γa to 0.5 (Equation 2) for all models. We performed grid search with cross validation to optimize all the other tunable hyper-parameters of each method using the combined measure on the validation set.