Learning Context-dependent Label Permutations for Multi-label Classification

Authors: Jinseok Nam, Young-Bum Kim, Eneldo Loza Mencia, Sunghyun Park, Ruhi Sarikaya, Johannes Fürnkranz

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on three public multi-label classification benchmarks show that our proposed dynamic label ordering approach based on reinforcement learning outperforms recurrent neural networks with fixed label ordering across both bipartition and ranking measures on all the three datasets. We analyze both techniques empirically on datasets with different characteristics and in comparison to static baseline sequence ordering strategies.
Researcher Affiliation Collaboration 1Amazon, Seattle, Washington, USA 2Knowledge Engineering, TU Darmstadt, Darmstadt, Hessen, Germany.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing code or links to source code for the methodology described.
Open Datasets Yes We carried out our experiments on three multi-label datasets from the XML Extreme Classification Repository2 and their statistics are given in Table 1. 2http://manikvarma.org/downloads/XC/ XMLRepository.html accessed 2019-01-12.
Dataset Splits Yes We set aside 10% of the training data as the validation sets.
Hardware Specification Yes All RNN-based MLC models were trained on NVIDIA Tesla V100
Software Dependencies No The paper mentions software components like 'Adam', 'layer normalization', 'variational dropout', and 'gated recurrent units (GRUs)' but does not provide specific version numbers for any libraries or frameworks used, such as PyTorch, TensorFlow, or Python.
Experiment Setup Yes The dimensionality of label embeddings and hidden activations of our proposed approach on all the datasets were 512 and 2048, respectively. For AC, the number of samples K in Eq. (8) set to 1, discount factor γ {0.1, 0.3, 0.6, 0.9, 0.99} and entropy regularization parameter β {0.01, 0.0001} were chosen based on the performance on the validation set for each dataset. For RNN training, layer normalization (Ba et al., 2016) and variational dropout (Gal & Ghahramani, 2016) were applied. In addition to the use of variational dropout for RNNs, we also applied plain dropout on input features with probability 0.2 or 0.5 when overfitting was observed. As optimization algorithm, we used Adam (Kingma & Ba, 2015) with a learning rate of 0.0001 and minibatches of size 128.