PiCO: Contrastive Label Disambiguation for Partial Label Learning

Authors: Haobo Wang, Ruixuan Xiao, Yixuan Li, Lei Feng, Gang Niu, Gang Chen, Junbo Zhao

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Pi CO significantly outperforms the current state-of-the-art approaches in PLL and even achieves comparable results to fully supervised learning.
Researcher Affiliation Academia 1Zhejiang University 2University of Wisconsin-Madison 3Chongqing University 4RIKEN
Pseudocode Yes C PSEUDO-CODE OF PICO
Open Source Code Yes Code and data available: https://github.com/hbzju/Pi CO.
Open Datasets Yes First, we evaluate Pi CO on two commonly used benchmarks CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009).
Dataset Splits Yes Following the standard experimental setup in PLL (Feng et al., 2020b; Wen et al., 2021), we split a clean validation set (10% of training data) from the training set to select the hyperparameters.
Hardware Specification Yes We train the models using one Quadro P5000 GPU respectively and evaluate the average training time per epoch.
Software Dependencies No The paper mentions using an '18-layer Res Net' and 'Sim Augment' and 'Rand Augment' for data augmentation, but it does not specify software versions for frameworks (like PyTorch or TensorFlow), libraries, or specific ResNet implementations.
Experiment Setup Yes The projection head of the contrastive network is a 2-layer MLP that outputs 128-dimensional embeddings. We use two data augmentation modules Sim Augment (Khosla et al., 2020) and Rand Augment (Cubuk et al., 2019) for query and key data augmentation respectively. [...] The size of the queue that stores key embeddings is fixed to be 8192. The momentum coefficients are set as 0.999 for contrastive network updating and γ = 0.99 for prototype calculation. For pseudo target updating, we linearly ramp down φ from 0.95 to 0.8. The temperature parameter is set as τ = 0.07. The loss weighting factor is set as λ = 0.5. The model is trained by a standard SGD optimizer with a momentum of 0.9 and the batch size is 256. We train the model for 800 epochs with cosine learning rate scheduling. We also empirically find that classifier warm-up leads to better performance when there are many candidates. Hence, we disable contrastive learning in the first 100 epoch for CIFAR-100 with q = 0.1 and 1 epoch for the remaining experiments.