Does Label Smoothing Help Deep Partial Label Learning?

Authors: Xiuwen Gong, Nitin Bisht, Guandong Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on benchmark PLL datasets and various deep architectures validate that label smoothing does help deep PLL in improving classification performance and learning distinguishable representations, and the best results can be achieved when the empirical smoothing rate approximately approaches the optimal smoothing rate in theoretical findings.
Researcher Affiliation Academia 1Faculty of Engineering and Information Technology, University of Technology Sydney, NSW, Australia 2Department of Computing, The Hong Kong Polytechnic University (Poly U), Kowloon, Hong Kong. Correspondence to: Guandong Xu <Gdxu@eduhk.hk>.
Pseudocode Yes Algorithm 1 LS-PLL Algorithm
Open Source Code Yes Code is publicly available at https://github.com/kalpiree/LS-PLL.
Open Datasets Yes We conduct experiments on four commonly used benchmark datasets, i.e., Fashion-MNIST (Xiao et al., 2017), Kuzushiji MNIST (Clanuwat et al., 2018), CIFAR-10 and CIFAR-100 (Krizhevsky, 2009).
Dataset Splits No The paper discusses training and testing, and hyperparameter tuning (e.g., empirical smoothing rate 'r' and weighting parameter 'η'), but it does not explicitly specify a separate validation dataset split (e.g., percentages or counts) for model tuning or early stopping.
Hardware Specification No The paper mentions the neural network architectures used (LeNet-5, ResNet-18, ResNet-56) but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) on which the experiments were run.
Software Dependencies No The paper mentions using 'stochastic gradient descent (SGD)' as the optimizer, but it does not specify any software libraries or their version numbers (e.g., TensorFlow, PyTorch, or Scikit-learn versions) that were used for implementation.
Experiment Setup Yes The optimizer is stochastic gradient descent (SGD) (Robbins et al., 1951) with momentum 0.9 and a weight decay of 1e 3 for model training. The mini-batch size, learning rate and total training epochs are set to 128, 0.01, and 200 respectively. Moreover, the empirical smoothing rate r is chosen from {0.1, 0.3, 0.5, 0.7, 0.9}. The weighting parameter η is set to be 0.9.