reproducibilityindex.ai

Does Label Smoothing Help Deep Partial Label Learning?

Authors: Xiuwen Gong, Nitin Bisht, Guandong Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on benchmark PLL datasets and various deep architectures validate that label smoothing does help deep PLL in improving classiﬁcation performance and learning distinguishable representations, and the best results can be achieved when the empirical smoothing rate approximately approaches the optimal smoothing rate in theoretical ﬁndings.
Researcher Affiliation	Academia	1Faculty of Engineering and Information Technology, University of Technology Sydney, NSW, Australia 2Department of Computing, The Hong Kong Polytechnic University (Poly U), Kowloon, Hong Kong. Correspondence to: Guandong Xu <Gdxu@eduhk.hk>.
Pseudocode	Yes	Algorithm 1 LS-PLL Algorithm
Open Source Code	Yes	Code is publicly available at https://github.com/kalpiree/LS-PLL.
Open Datasets	Yes	We conduct experiments on four commonly used benchmark datasets, i.e., Fashion-MNIST (Xiao et al., 2017), Kuzushiji MNIST (Clanuwat et al., 2018), CIFAR-10 and CIFAR-100 (Krizhevsky, 2009).
Dataset Splits	No	The paper discusses training and testing, and hyperparameter tuning (e.g., empirical smoothing rate 'r' and weighting parameter 'η'), but it does not explicitly specify a separate validation dataset split (e.g., percentages or counts) for model tuning or early stopping.
Hardware Specification	No	The paper mentions the neural network architectures used (LeNet-5, ResNet-18, ResNet-56) but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) on which the experiments were run.
Software Dependencies	No	The paper mentions using 'stochastic gradient descent (SGD)' as the optimizer, but it does not specify any software libraries or their version numbers (e.g., TensorFlow, PyTorch, or Scikit-learn versions) that were used for implementation.
Experiment Setup	Yes	The optimizer is stochastic gradient descent (SGD) (Robbins et al., 1951) with momentum 0.9 and a weight decay of 1e 3 for model training. The mini-batch size, learning rate and total training epochs are set to 128, 0.01, and 200 respectively. Moreover, the empirical smoothing rate r is chosen from {0.1, 0.3, 0.5, 0.7, 0.9}. The weighting parameter η is set to be 0.9.