Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distributionally Robust Feature Selection

Authors: Maitreyi Swaroop, Tamar Krishnamurti, Bryan Wilder

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach through experiments on both synthetic datasets and real-world data.
Researcher Affiliation	Academia	Maitreyi Swaroop Machine Learning Department Carnegie Mellon University EMAIL Tamar Krishnamurti Division of General Internal Medicine University of Pittsburgh EMAIL Bryan Wilder Machine Learning Department Carnegie Mellon University EMAIL
Pseudocode	Yes	We summarize our complete proposed method in Algorithm 1 in Appendix B, and provide its computational complexity. The proposed procedure, is outlined in algorithm 1. Algorithm 1 Distributionally Robust Feature Selection
Open Source Code	Yes	1Code for implementing our method is available here (linked).
Open Datasets	Yes	UCI Adult Income Dataset [Becker and Kohavi, 1996] We use the UCI Adult Income dataset to predict income across different demographic groups, where each age group represents a distinct population. American Community Survey (ACS) Dataset [U.S. Census Bureau, 2018] We use person-level ACS Public Use Microdata Sample (PUMS) data for the year 2018 to predict household income across state populations.
Dataset Splits	Yes	We split each dataset into three parts a feature-selection-dataset, downstreammodel-training-dataset and a downstream-model-test-dataset. We first do a 60 : 40 split of each population to obtain the feature-selection-dataset and the downstream model training and evaluation datasets. The latter is split 80 : 20 for downstream-model training and evaluation respectively.
Hardware Specification	Yes	Experiments were conducted on an Apple Mac Book Pro equipped with an Apple M3.
Software Dependencies	No	We implemented our method using the Py Torch [Paszke et al., 2019] library, while for the downstream models, we use the scikit-learn [Pedregosa et al., 2011] library. All baselines plus our method share the pipeline for downstream models, isolating the impact of the feature selections they output as opposed to predictive performance of models that they use en route.
Experiment Setup	Yes	Implementation details α is initialized values to near 1 by adding random noise to a vector of ones. We use Adam optimzer Kingma [2014] with a learning rate of 0.1. We also use a Cosine Annealing Scheduler for the learning rate, and train the model for 200 epochs. For the kernel estimation, we set the number of nearest neighbours k = 1000. We take 10 Monte Carlo samples for estimating the objective. At each epoch, we do a full-batch gradient descent. For the objective, we use the hard-max formulation (setting the Soft Max parameter to inf). The penalty term is a reciprocal of the L1 norm of α.