DiFA: Differentiable Feature Acquisition
Authors: Aritra Ghosh, Andrew Lan
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on various real-world datasets and show that Di FA significantly outperforms existing feature acquisition methods when the number of features is large. We conduct extensive experiments on various real-world datasets and show that Di FA significantly outperforms existing feature acquisition methods when the number of features is large. We verify the effectiveness of Di FA through extensive experiments on several large real-world datasets. We observe that the learned acquisition policy outperforms existing policies in the prediction tasks, requiring (sometimes significantly) fewer features to reach the same predictive quality. In this section, we detail our experimental setup, model implementation, and experimental results on real-world public datasets. We compare our method, Di FA, with the JAFA (Shim, Hwang, and Yang 2018), GSMRL (Li and Oliva 2021), and EDDI methods (Ma et al. 2019). We repeat each experiment five times with different random seeds and list/plot the mean and the standard deviation (std) numbers; we list the p-values in the supplementary material. We implement all methods in Py Torch and run our experiments in a single NVIDIA 2080Ti GPU. Our implementation will be publicly available at https://github.com/arghosh/DiFA. We take a unifying approach for preprocessing, network architecture, and training process for all the methods considered in this paper. The reason is that we observe that minor changes in preprocessing, network architecture, training process, and even the base RL algorithm can cause significant changes in the performance of the same method. Since we use both image and static datasets, we start with the overall setup first. We detail the network architecture specifics of the dataset, in their respective subsections. We list the hyperparameters in the supplementary material. |
| Researcher Affiliation | Academia | Aritra Ghosh and Andrew Lan University of Massachusetts Amherst arighosh@cs.umass.edu, andrewlan@cs.umass.edu |
| Pseudocode | Yes | We summarize Di FA s training process in Algorithm 1. |
| Open Source Code | Yes | Our implementation will be publicly available at https://github.com/arghosh/DiFA. |
| Open Datasets | Yes | We use the following four datasets for image classification experiments: MNIST (C = 10, n = 70K, c = 1, D = 1 28 28) (Le Cun et al. 1998), Fashion MNIST (C = 10, n = 70K, c = 1, D = 1 28 28) (Xiao, Rasul, and Vollgraf 2017), SVHN (C = 10, n = 100K, c = 3, D = 3 32 32) (Netzer et al. 2011), and CIFAR10 (C = 10, n = 70K, c = 3, D = 3 32 32) (Krizhevsky, Hinton et al. 2009). We use two other supervised datasets from the UCI repository (Asuncion and Newman 2007): the Grid dataset (a binary classification task with n = 10K, D = 12) and the Parkinson dataset (a regression task with n = 5K, D = 16) where n and D are the number of data points and feature dimension, respectively. We also experiment with the in-hospital mortality task (an imbalanced binary classification task with n = 12K, D = 41) from the Physio Net 2012 challenge (Goldberger et al. 2000). |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits, such as exact percentages, sample counts, or specific citations for predefined splits. While standard datasets like MNIST and CIFAR-10 are mentioned, the paper does not specify the splits used for its experiments. |
| Hardware Specification | Yes | We implement all methods in Py Torch and run our experiments in a single NVIDIA 2080Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' as a software tool used for implementation but does not specify a version number or list other software dependencies with version numbers. |
| Experiment Setup | No | The paper states 'We list the hyperparameters in the supplementary material' and does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) in the main text. It mentions some architectural choices and the use of weighted cross-entropy loss for one dataset, but lacks the comprehensive, concrete setup details required for this question. |