Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding

Authors: Hajin Shim, Sung Ju Hwang, Eunho Yang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on a carefully designed synthetic dataset for the active feature acquisition as well as several medical datasets.
Researcher Affiliation Collaboration Hajin Shim1, Sung Ju Hwang1,2, Eunho Yang1,2 KAIST1, AItrics2, South Korea {shimazing, sjhwang82, eunhoy} @kaist.ac.kr
Pseudocode Yes Due to the space constraint, the pseudocode that summarizes the learning algorithm is deferred to the supplementary material.
Open Source Code Yes Our code is available at https://github.com/OpenXAIProject/Joint-AFA-Classification.
Open Datasets Yes We first experiment on a synthetic dataset CUBE-σ to see if the agent can identify few important features that are relevant to the given classification task. See Fig 2a and [29] for detailed description of the dataset. First, we conduct the experiment on EHR dataset from Physionet challenge 2012 [30].
Dataset Splits Yes We only use the training set whose labels are available and take the features only in the last timestep and split the data randomly into the training/validation/test set by 3000/500/500 ratio. We randomly split the data into three folds with the ratio of 64:16:20 for train:validation:test.
Hardware Specification Yes for testing, it takes about 0.5 sec to evaluate 500 instances (on GTX 1070).
Software Dependencies No The paper mentions 'Adam optimizer' but does not provide specific version numbers for it or any other software dependencies like programming languages or libraries.
Experiment Setup Yes Throughout all experiments, we use Adam optimizer [28] with 0.001 learning rate and train the models for fixed number of iterations. We train Q, C and Enc for 10000 iterations on 10, 000 training instances. It is trained by 4-step Q-learning. Per iteration, 128 agents run in parallel for 4 steps. Instead of updating at once, we do mini-batch update with the size of 128 for 1 epoch (4 times update). We assume uniform acquisition cost 0.05 and the final reward as the negative classification loss L based on C. Both C and Q has two hidden layers of 32-32 units, and Enc consists of the MLP with two hidden layers of 32-32 units which maps features to 16 dimensional real-valued vectors and LSTM whose hidden size is 16. For -greedy exploration, linearly decreases from 1 to 0.1 for the first 5000 iterations.