reproducibilityindex.ai

Deletion-Anticipative Data Selection with a Limited Budget

Authors: Rachael Hwee Ling Sim, Jue Fan, Xiao Tian, Patrick Jaillet, Bryan Kian Hsiang Low

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further propose how to construct these deletion-anticipative data selection (DADS) maximization objectives to preserve monotone submodularity and near-optimality of greedy solutions, how to optimize the objectives and empirically evaluate DADS performance on realworld datasets.
Researcher Affiliation	Academia	1Department of Computer Science, National University of Singapore, Republic of Singapore 2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA.
Pseudocode	Yes	Algorithm 1 Sample average approximation for EDADS. Algorithm 2 Sample average approximation for RAα-DADS.
Open Source Code	Yes	Please refer to our Github respository for the implementation details.
Open Datasets	Yes	Heart disease dataset (Lapp, 2019). The benchmark Fashion MNIST image dataset (Xiao et al., 2017). Adults income dataset (Becker & Kohavi, 1996).
Dataset Splits	Yes	(NN-H) ...We set aside 25% of the data as the validation set and use the remaining 75% as the feasible set... (NN-F) ...The learner measures the accuracy score on the validation set.
Hardware Specification	Yes	The experiments are run on a machine with Ubuntu 22.04.3 LTS, 2 x Intel Xeon Silver 4116 (2.1 GHz), and NVIDIA Titan RTX GPU (Cuda 11.7).
Software Dependencies	No	The paper mentions "Miniconda and Python" as software environments, but does not provide specific version numbers for these or any other key software libraries or packages.
Experiment Setup	Yes	We pre-process the data by min-max scaling to the [0, 1] range. The selected subset is used to train an NN classifier with L2 as the distance metric. We fix the class prior based on the ratio in the feasible set (i.e., it does not depend on the selected set) and use a Laplace smoothing factor of .01. We consider a neural network with 2 hidden layers with 128 and 32 units and Re Lu activation.