Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding
Authors: Hajin Shim, Sung Ju Hwang, Eunho Yang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on a carefully designed synthetic dataset for the active feature acquisition as well as several medical datasets. |
| Researcher Affiliation | Collaboration | Hajin Shim1, Sung Ju Hwang1,2, Eunho Yang1,2 KAIST1, AItrics2, South Korea {shimazing, sjhwang82, eunhoy} @kaist.ac.kr |
| Pseudocode | Yes | Due to the space constraint, the pseudocode that summarizes the learning algorithm is deferred to the supplementary material. |
| Open Source Code | Yes | Our code is available at https://github.com/OpenXAIProject/Joint-AFA-Classification. |
| Open Datasets | Yes | We first experiment on a synthetic dataset CUBE-σ to see if the agent can identify few important features that are relevant to the given classification task. See Fig 2a and [29] for detailed description of the dataset. First, we conduct the experiment on EHR dataset from Physionet challenge 2012 [30]. |
| Dataset Splits | Yes | We only use the training set whose labels are available and take the features only in the last timestep and split the data randomly into the training/validation/test set by 3000/500/500 ratio. We randomly split the data into three folds with the ratio of 64:16:20 for train:validation:test. |
| Hardware Specification | Yes | for testing, it takes about 0.5 sec to evaluate 500 instances (on GTX 1070). |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not provide specific version numbers for it or any other software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Throughout all experiments, we use Adam optimizer [28] with 0.001 learning rate and train the models for fixed number of iterations. We train Q, C and Enc for 10000 iterations on 10, 000 training instances. It is trained by 4-step Q-learning. Per iteration, 128 agents run in parallel for 4 steps. Instead of updating at once, we do mini-batch update with the size of 128 for 1 epoch (4 times update). We assume uniform acquisition cost 0.05 and the final reward as the negative classification loss L based on C. Both C and Q has two hidden layers of 32-32 units, and Enc consists of the MLP with two hidden layers of 32-32 units which maps features to 16 dimensional real-valued vectors and LSTM whose hidden size is 16. For -greedy exploration, linearly decreases from 1 to 0.1 for the first 5000 iterations. |