Dual Perspective of Label-Specific Feature Learning for Multi-Label Classification

Authors: Jun-Yi Hang, Min-Ling Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on 10 benchmark data sets show that our approach outperforms the state-of-the-art counterparts.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China. Correspondence to: Min-Ling Zhang <zhangml@seu.edu.cn>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code package is publicly available at: http://palm.seu. edu.cn/zhangml/files/DELA.rar.
Open Datasets Yes For comprehensive performance evaluation, ten benchmark multi-label data sets with diversified multi-label properties are employed in this paper. Table 1 summarizes detailed properties of each data set D, including the number of examples (|D|), number of features (dim(D)), number of class labels (L(D)), feature type (F(D)), label cardinality (LCard(D), i.e. average number of labels per instance). ... 1 http://mulan.sourceforge.net/datasets.html 2 http://lear.inrialpes.fr/people/guillaumin/data.php
Dataset Splits Yes We take out 10% examples in each data set as hold-out validation set for hyperparamter searching and perform ten-fold cross validation on the remaining 90% examples to evaluate above approaches.
Hardware Specification No The paper states: 'We thank the Big Data Center of Southeast University for providing the facility support on the numerical calculations in this paper.' However, no specific hardware details such as CPU/GPU models, memory, or processing units are provided.
Software Dependencies No The paper mentions software components like 'Adam' optimizer, 'Re LU activations', 'cross entropy loss', 'Gumbel Softmax trick', but does not provide specific version numbers for these or for underlying programming languages/libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes Specifically, the embedding function eφ is instantiated by a fully-connected neural network with Re LU activations, where the hidden dimensionalities are set to [256; 512; 256]. The standard deviation function σϑ to parametrize the noise distribution ... is a four-layer fully-connected neural network, which shares the first three layers with eφ. Classifiers fk, k [t] are implemented as three-layer fully-connected neural networks, where the hidden dimensionalities are set to [256; 512] and the first two layers are shared among all the class labels. ... we employ a two-layer fully-connected neural network to produce the distribution parameters {p1, . . . , pt} and use τ = 2/3 as suggested by (Maddison et al., 2017). ... We consider cross entropy loss to instantiate the risk function L( , ). ... we conduct Monte Carlo sampling to estimate the expectations in terms of p(ck), p(zk|x, ck) with sampling number L = 1 and analytically calculate the KL-divergence term between two Gaussian distributions. For network optimization, Adam with a batch size of 128, weight decay of 10 4, momentums of 0.999 and 0.9 is employed. ... the trade-off parameter β is searched in {10 5, 10 4, . . . , 10}.