Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Conformal Prediction for Partial Label Learning

Authors: Xiuwen Gong, Nitin Bisht, Guandong Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on SOTA PLL methods and benchmark datasets to verify the effectiveness of the proposed framework. In this section, we empirically test the validity of the proposed framework CP-PLL in quantifying the uncertainty (i.e., predictive conﬁdence) of partial label learning models by implementing it on top of the state-of-the-art PLL models and various datasets in terms of average set size (the smaller the better).
Researcher Affiliation	Academia	1 University of Technology Sydney 2 The Education University of Hong Kong
Pseudocode	Yes	Algorithm 1: CP-PLL Algorithm Goal: Constructing PLL set predictor Cα P LL(X). Input: PLL calibration dataset {(xi, ˆyi)}n i=1, pre-trained model f, a testing instance xt; Output: the prediction set Cα P LL(xt). 1: Compute the partial label score function SP LL(xi, ˆyi) given Eq. (4); 2: Compute the PLL quantile function QP LL given Eq. (5); 3: Generate the prediction set Cα P LL(xt) given Eq. (6).
Open Source Code	Yes	Code is publicly available at https://github.com/kalpiree/CP-PLL.
Open Datasets	Yes	We evaluate CP-PLL on various benchmark datasets, including CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009), and their corresponding long-tailed versions, i.e., CIFAR-10-LT, CIFAR-100-LT.
Dataset Splits	Yes	We split the held-out training data with 60% as the calibartion data and 40% as the testing data on all datasets.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned in the paper for running experiments.
Software Dependencies	No	The paper mentions using '18-layer Res Net as the backbone' and 'SGD with momentum of 0.9 and weight decay of 0.001 as the optimizer', but does not provide specific version numbers for any software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We use 18-layer Res Net as the backbone. The mini-batch size is set to 256 and all the methods are trained using SGD with momentum of 0.9 and weight decay of 0.001 as the optimizer. The initial learning rate is set to 0.01. We train the model for 800 epochs.