Dual Perspective of Label-Specific Feature Learning for Multi-Label Classification
Authors: Jun-Yi Hang, Min-Ling Zhang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on 10 benchmark data sets show that our approach outperforms the state-of-the-art counterparts. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China. Correspondence to: Min-Ling Zhang <zhangml@seu.edu.cn>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code package is publicly available at: http://palm.seu. edu.cn/zhangml/files/DELA.rar. |
| Open Datasets | Yes | For comprehensive performance evaluation, ten benchmark multi-label data sets with diversified multi-label properties are employed in this paper. Table 1 summarizes detailed properties of each data set D, including the number of examples (|D|), number of features (dim(D)), number of class labels (L(D)), feature type (F(D)), label cardinality (LCard(D), i.e. average number of labels per instance). ... 1 http://mulan.sourceforge.net/datasets.html 2 http://lear.inrialpes.fr/people/guillaumin/data.php |
| Dataset Splits | Yes | We take out 10% examples in each data set as hold-out validation set for hyperparamter searching and perform ten-fold cross validation on the remaining 90% examples to evaluate above approaches. |
| Hardware Specification | No | The paper states: 'We thank the Big Data Center of Southeast University for providing the facility support on the numerical calculations in this paper.' However, no specific hardware details such as CPU/GPU models, memory, or processing units are provided. |
| Software Dependencies | No | The paper mentions software components like 'Adam' optimizer, 'Re LU activations', 'cross entropy loss', 'Gumbel Softmax trick', but does not provide specific version numbers for these or for underlying programming languages/libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Specifically, the embedding function eφ is instantiated by a fully-connected neural network with Re LU activations, where the hidden dimensionalities are set to [256; 512; 256]. The standard deviation function σϑ to parametrize the noise distribution ... is a four-layer fully-connected neural network, which shares the first three layers with eφ. Classifiers fk, k [t] are implemented as three-layer fully-connected neural networks, where the hidden dimensionalities are set to [256; 512] and the first two layers are shared among all the class labels. ... we employ a two-layer fully-connected neural network to produce the distribution parameters {p1, . . . , pt} and use τ = 2/3 as suggested by (Maddison et al., 2017). ... We consider cross entropy loss to instantiate the risk function L( , ). ... we conduct Monte Carlo sampling to estimate the expectations in terms of p(ck), p(zk|x, ck) with sampling number L = 1 and analytically calculate the KL-divergence term between two Gaussian distributions. For network optimization, Adam with a batch size of 128, weight decay of 10 4, momentums of 0.999 and 0.9 is employed. ... the trade-off parameter β is searched in {10 5, 10 4, . . . , 10}. |