Partial Multi-Label Learning with Probabilistic Graphical Disambiguation

Authors: Jun-Yi Hang, Min-Ling Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on multiple synthetic and real-world data sets show that our approach outperforms the state-of-the-art counterparts.
Researcher Affiliation Academia Jun-Yi Hang, Min-Ling Zhang School of Computer Science and Engineering, Southeast University, Nanjing 210096, China Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China {hangjy, zhangml}@seu.edu.cn
Pseudocode Yes Algorithm 1 Pseudocode of the Optimization Procedure for PARD
Open Source Code Yes 4Code package of PARD is publicly available at http://palm.seu.edu.cn/zhangml/files/PARD.rar.
Open Datasets Yes For comprehensive performance evaluation, five real-world and a number of synthetic PML data sets are employed in this paper. Table 1 summarizes detailed characteristics of each data set. Specifically, the first five data sets are real-world PML data sets... While the last five data sets, including corel5k, rcv1-s1, Corel16k-s1, iaprtc12 and espgame, are multi-label data sets. 1 http://palm.seu.edu.cn/zhangml/ 2 http://mulan.sourceforge.net/datasets.html 3 http://lear.inrialpes.fr/people/guillaumin/data.php
Dataset Splits Yes Following [37], we take out 10% examples in each data set as hold-out validation set... The remaining 90% examples are randomly splitted into training set and test set with a ratio of 9:1 for training and evaluation repectively.
Hardware Specification Yes In this paper, all experiments are conducted on one V100 GPU.
Software Dependencies No All the distributions involved in Eq. (3) are instantiated as multivariate Bernoulli distributions and are parameterized by neural networks. ... For network optimization, Adam with a batch size of 128, weight decay of 10 4, momentums of 0.999 and 0.9 is employed.
Experiment Setup Yes the hidden dimensionalities are set to [256; 512; 256] and [256; 512] respectively. For fair comparison with existing PML approaches, the prediction model is implemented as a linear model. To compute the objective function in Eq. (3), a trade-off parameter α is introduced for the KL-divergence term and Monte Carlo sampling with sampling number L = 1 is conducted to estimate the first expectation term, where the temperature parameter τ = 2/3 as suggested by [27]. In the following experiments, we set α 1 so that the objective function is still a valid lower bound of the data log-likelihood. For network optimization, Adam with a batch size of 128, weight decay of 10 4, momentums of 0.999 and 0.9 is employed. In this paper, all experiments are conducted on one V100 GPU.