Deletion-Anticipative Data Selection with a Limited Budget
Authors: Rachael Hwee Ling Sim, Jue Fan, Xiao Tian, Patrick Jaillet, Bryan Kian Hsiang Low
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further propose how to construct these deletion-anticipative data selection (DADS) maximization objectives to preserve monotone submodularity and near-optimality of greedy solutions, how to optimize the objectives and empirically evaluate DADS performance on realworld datasets. |
| Researcher Affiliation | Academia | 1Department of Computer Science, National University of Singapore, Republic of Singapore 2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA. |
| Pseudocode | Yes | Algorithm 1 Sample average approximation for EDADS. Algorithm 2 Sample average approximation for RAα-DADS. |
| Open Source Code | Yes | Please refer to our Github respository for the implementation details. |
| Open Datasets | Yes | Heart disease dataset (Lapp, 2019). The benchmark Fashion MNIST image dataset (Xiao et al., 2017). Adults income dataset (Becker & Kohavi, 1996). |
| Dataset Splits | Yes | (NN-H) ...We set aside 25% of the data as the validation set and use the remaining 75% as the feasible set... (NN-F) ...The learner measures the accuracy score on the validation set. |
| Hardware Specification | Yes | The experiments are run on a machine with Ubuntu 22.04.3 LTS, 2 x Intel Xeon Silver 4116 (2.1 GHz), and NVIDIA Titan RTX GPU (Cuda 11.7). |
| Software Dependencies | No | The paper mentions "Miniconda and Python" as software environments, but does not provide specific version numbers for these or any other key software libraries or packages. |
| Experiment Setup | Yes | We pre-process the data by min-max scaling to the [0, 1] range. The selected subset is used to train an NN classifier with L2 as the distance metric. We fix the class prior based on the ratio in the feasible set (i.e., it does not depend on the selected set) and use a Laplace smoothing factor of .01. We consider a neural network with 2 hidden layers with 128 and 32 units and Re Lu activation. |