Complementary-Label Learning for Arbitrary Losses and Models

Authors: Takashi Ishida, Gang Niu, Aditya Menon, Masashi Sugiyama

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare the 3 methods that we have proposed in Section 3, which are Free (Unbiased risk estimator that is loss assumption free, based on Eq. (13)), Max Operator (based on Eq. (17)), and Gradient Ascent (based on Alg.1). For Gradient Ascent, we used β = 0 and γ = 1 for simplicity. Mini-batch size was set to 256. We also compare with two baseline methods: Pairwise comparison (PC) with ramp loss from Ishida et al. (2017) and Forward correction from Yu et al. (2018). For training, we used only complementarily labeled data, which was generated so that the assumption of (5) is satisfied. This is straightforward when the dataset has a uniform (ordinarily-labeled) class prior, because it reduces to just choosing a class randomly other than the true class. In Appendix C, we explain the details of the datasets used in the experiments: MNIST, Fashion-MNIST, Kuzushi MNIST, and CIFAR-10. The implementation is based on Pytorch2 and our demo code is available online3. We show the accuracy for all 300 epochs on test data to demonstrate how the issues discussed in Section 3.2 appear and how different implementations in Section 3.4 are effective. In Figure 2, we show the mean and standard deviation of test accuracy for 4 trials on test data evaluated with ordinary labels.
Researcher Affiliation Collaboration 1The University of Tokyo 2RIKEN 3Google Research. Correspondence to: Takashi Ishida <ishida@ms.k.u-tokyo.ac.jp>.
Pseudocode Yes Algorithm 1 Complementary-label learning with gradient ascent
Open Source Code Yes The implementation is based on Pytorch2 and our demo code is available online3. (Footnote 3: https://github.com/takashiishida/comp)
Open Datasets Yes In Appendix C, we explain the details of the datasets used in the experiments: MNIST, Fashion-MNIST, Kuzushi MNIST, and CIFAR-10.
Dataset Splits Yes Next, we perform experiments with a train, validation, and test split. The dataset is constructed by splitting the original training data used in the previous experiments into train/validation with a 9:1 ratio.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies No The implementation is based on Pytorch2, which refers to a framework but does not provide a specific version number required for reproducibility.
Experiment Setup Yes For MNIST, Fashion-MNIST, and Kuzushi MNIST, a linear-in-input model with a bias term and a MLP model (d 500 1) was trained with softmax cross-entropy loss function (except PC) for 300 epochs. Weight decay of 1e 4 for weight parameters and learning rate of 5e 5 for Adam (Kingma & Ba, 2015) was used. For CIFAR-10, Dense Net (Huang et al., 2017) and Res Net34 (He et al., 2016) were used with weight decay of 5e 4 and initial learning rate of 1e 2. For optimization, stochastic gradient descent was used with the momentum set to 0.9. Learning rate was halved every 30 epochs. Mini-batch size was set to 256. Weight-decay was fixed to 1e 4 and learning rate candidates are {1e 4, 5e 4, 1e 3, 5e 3, 1e 2, 5e 2} for CIFAR-10 and {5e 5, 1e 4, 5e 4, 1e 3, 5e 3, 1e 2} for other datasets.