Does Label Smoothing Help Deep Partial Label Learning?
Authors: Xiuwen Gong, Nitin Bisht, Guandong Xu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on benchmark PLL datasets and various deep architectures validate that label smoothing does help deep PLL in improving classification performance and learning distinguishable representations, and the best results can be achieved when the empirical smoothing rate approximately approaches the optimal smoothing rate in theoretical findings. |
| Researcher Affiliation | Academia | 1Faculty of Engineering and Information Technology, University of Technology Sydney, NSW, Australia 2Department of Computing, The Hong Kong Polytechnic University (Poly U), Kowloon, Hong Kong. Correspondence to: Guandong Xu <Gdxu@eduhk.hk>. |
| Pseudocode | Yes | Algorithm 1 LS-PLL Algorithm |
| Open Source Code | Yes | Code is publicly available at https://github.com/kalpiree/LS-PLL. |
| Open Datasets | Yes | We conduct experiments on four commonly used benchmark datasets, i.e., Fashion-MNIST (Xiao et al., 2017), Kuzushiji MNIST (Clanuwat et al., 2018), CIFAR-10 and CIFAR-100 (Krizhevsky, 2009). |
| Dataset Splits | No | The paper discusses training and testing, and hyperparameter tuning (e.g., empirical smoothing rate 'r' and weighting parameter 'η'), but it does not explicitly specify a separate validation dataset split (e.g., percentages or counts) for model tuning or early stopping. |
| Hardware Specification | No | The paper mentions the neural network architectures used (LeNet-5, ResNet-18, ResNet-56) but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) on which the experiments were run. |
| Software Dependencies | No | The paper mentions using 'stochastic gradient descent (SGD)' as the optimizer, but it does not specify any software libraries or their version numbers (e.g., TensorFlow, PyTorch, or Scikit-learn versions) that were used for implementation. |
| Experiment Setup | Yes | The optimizer is stochastic gradient descent (SGD) (Robbins et al., 1951) with momentum 0.9 and a weight decay of 1e 3 for model training. The mini-batch size, learning rate and total training epochs are set to 128, 0.01, and 200 respectively. Moreover, the empirical smoothing rate r is chosen from {0.1, 0.3, 0.5, 0.7, 0.9}. The weighting parameter η is set to be 0.9. |