PiCO: Contrastive Label Disambiguation for Partial Label Learning
Authors: Haobo Wang, Ruixuan Xiao, Yixuan Li, Lei Feng, Gang Niu, Gang Chen, Junbo Zhao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Pi CO significantly outperforms the current state-of-the-art approaches in PLL and even achieves comparable results to fully supervised learning. |
| Researcher Affiliation | Academia | 1Zhejiang University 2University of Wisconsin-Madison 3Chongqing University 4RIKEN |
| Pseudocode | Yes | C PSEUDO-CODE OF PICO |
| Open Source Code | Yes | Code and data available: https://github.com/hbzju/Pi CO. |
| Open Datasets | Yes | First, we evaluate Pi CO on two commonly used benchmarks CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | Following the standard experimental setup in PLL (Feng et al., 2020b; Wen et al., 2021), we split a clean validation set (10% of training data) from the training set to select the hyperparameters. |
| Hardware Specification | Yes | We train the models using one Quadro P5000 GPU respectively and evaluate the average training time per epoch. |
| Software Dependencies | No | The paper mentions using an '18-layer Res Net' and 'Sim Augment' and 'Rand Augment' for data augmentation, but it does not specify software versions for frameworks (like PyTorch or TensorFlow), libraries, or specific ResNet implementations. |
| Experiment Setup | Yes | The projection head of the contrastive network is a 2-layer MLP that outputs 128-dimensional embeddings. We use two data augmentation modules Sim Augment (Khosla et al., 2020) and Rand Augment (Cubuk et al., 2019) for query and key data augmentation respectively. [...] The size of the queue that stores key embeddings is fixed to be 8192. The momentum coefficients are set as 0.999 for contrastive network updating and γ = 0.99 for prototype calculation. For pseudo target updating, we linearly ramp down φ from 0.95 to 0.8. The temperature parameter is set as τ = 0.07. The loss weighting factor is set as λ = 0.5. The model is trained by a standard SGD optimizer with a momentum of 0.9 and the batch size is 256. We train the model for 800 epochs with cosine learning rate scheduling. We also empirically find that classifier warm-up leads to better performance when there are many candidates. Hence, we disable contrastive learning in the first 100 epoch for CIFAR-100 with q = 0.1 and 1 epoch for the remaining experiments. |