Learning Context-dependent Label Permutations for Multi-label Classification
Authors: Jinseok Nam, Young-Bum Kim, Eneldo Loza Mencia, Sunghyun Park, Ruhi Sarikaya, Johannes Fürnkranz
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on three public multi-label classification benchmarks show that our proposed dynamic label ordering approach based on reinforcement learning outperforms recurrent neural networks with fixed label ordering across both bipartition and ranking measures on all the three datasets. We analyze both techniques empirically on datasets with different characteristics and in comparison to static baseline sequence ordering strategies. |
| Researcher Affiliation | Collaboration | 1Amazon, Seattle, Washington, USA 2Knowledge Engineering, TU Darmstadt, Darmstadt, Hessen, Germany. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing code or links to source code for the methodology described. |
| Open Datasets | Yes | We carried out our experiments on three multi-label datasets from the XML Extreme Classification Repository2 and their statistics are given in Table 1. 2http://manikvarma.org/downloads/XC/ XMLRepository.html accessed 2019-01-12. |
| Dataset Splits | Yes | We set aside 10% of the training data as the validation sets. |
| Hardware Specification | Yes | All RNN-based MLC models were trained on NVIDIA Tesla V100 |
| Software Dependencies | No | The paper mentions software components like 'Adam', 'layer normalization', 'variational dropout', and 'gated recurrent units (GRUs)' but does not provide specific version numbers for any libraries or frameworks used, such as PyTorch, TensorFlow, or Python. |
| Experiment Setup | Yes | The dimensionality of label embeddings and hidden activations of our proposed approach on all the datasets were 512 and 2048, respectively. For AC, the number of samples K in Eq. (8) set to 1, discount factor γ {0.1, 0.3, 0.6, 0.9, 0.99} and entropy regularization parameter β {0.01, 0.0001} were chosen based on the performance on the validation set for each dataset. For RNN training, layer normalization (Ba et al., 2016) and variational dropout (Gal & Ghahramani, 2016) were applied. In addition to the use of variational dropout for RNNs, we also applied plain dropout on input features with probability 0.2 or 0.5 when overfitting was observed. As optimization algorithm, we used Adam (Kingma & Ba, 2015) with a learning rate of 0.0001 and minibatches of size 128. |