Deletion-Anticipative Data Selection with a Limited Budget

Authors: Rachael Hwee Ling Sim, Jue Fan, Xiao Tian, Patrick Jaillet, Bryan Kian Hsiang Low

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further propose how to construct these deletion-anticipative data selection (DADS) maximization objectives to preserve monotone submodularity and near-optimality of greedy solutions, how to optimize the objectives and empirically evaluate DADS performance on realworld datasets.
Researcher Affiliation Academia 1Department of Computer Science, National University of Singapore, Republic of Singapore 2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA.
Pseudocode Yes Algorithm 1 Sample average approximation for EDADS. Algorithm 2 Sample average approximation for RAα-DADS.
Open Source Code Yes Please refer to our Github respository for the implementation details.
Open Datasets Yes Heart disease dataset (Lapp, 2019). The benchmark Fashion MNIST image dataset (Xiao et al., 2017). Adults income dataset (Becker & Kohavi, 1996).
Dataset Splits Yes (NN-H) ...We set aside 25% of the data as the validation set and use the remaining 75% as the feasible set... (NN-F) ...The learner measures the accuracy score on the validation set.
Hardware Specification Yes The experiments are run on a machine with Ubuntu 22.04.3 LTS, 2 x Intel Xeon Silver 4116 (2.1 GHz), and NVIDIA Titan RTX GPU (Cuda 11.7).
Software Dependencies No The paper mentions "Miniconda and Python" as software environments, but does not provide specific version numbers for these or any other key software libraries or packages.
Experiment Setup Yes We pre-process the data by min-max scaling to the [0, 1] range. The selected subset is used to train an NN classifier with L2 as the distance metric. We fix the class prior based on the ratio in the feasible set (i.e., it does not depend on the selected set) and use a Laplace smoothing factor of .01. We consider a neural network with 2 hidden layers with 128 and 32 units and Re Lu activation.