Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Conformal Prediction for Partial Label Learning
Authors: Xiuwen Gong, Nitin Bisht, Guandong Xu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on SOTA PLL methods and benchmark datasets to verify the effectiveness of the proposed framework. In this section, we empirically test the validity of the proposed framework CP-PLL in quantifying the uncertainty (i.e., predictive confidence) of partial label learning models by implementing it on top of the state-of-the-art PLL models and various datasets in terms of average set size (the smaller the better). |
| Researcher Affiliation | Academia | 1 University of Technology Sydney 2 The Education University of Hong Kong |
| Pseudocode | Yes | Algorithm 1: CP-PLL Algorithm Goal: Constructing PLL set predictor Cα P LL(X). Input: PLL calibration dataset {(xi, ˆyi)}n i=1, pre-trained model f, a testing instance xt; Output: the prediction set Cα P LL(xt). 1: Compute the partial label score function SP LL(xi, ˆyi) given Eq. (4); 2: Compute the PLL quantile function QP LL given Eq. (5); 3: Generate the prediction set Cα P LL(xt) given Eq. (6). |
| Open Source Code | Yes | Code is publicly available at https://github.com/kalpiree/CP-PLL. |
| Open Datasets | Yes | We evaluate CP-PLL on various benchmark datasets, including CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009), and their corresponding long-tailed versions, i.e., CIFAR-10-LT, CIFAR-100-LT. |
| Dataset Splits | Yes | We split the held-out training data with 60% as the calibartion data and 40% as the testing data on all datasets. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned in the paper for running experiments. |
| Software Dependencies | No | The paper mentions using '18-layer Res Net as the backbone' and 'SGD with momentum of 0.9 and weight decay of 0.001 as the optimizer', but does not provide specific version numbers for any software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We use 18-layer Res Net as the backbone. The mini-batch size is set to 256 and all the methods are trained using SGD with momentum of 0.9 and weight decay of 0.001 as the optimizer. The initial learning rate is set to 0.01. We train the model for 800 epochs. |