From Label Smoothing to Label Relaxation

Authors: Julian Lienen, Eyke Hüllermeier8583-8591

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of the approach is demonstrated in an empirical study on image data. and To demonstrate the effectiveness of label relaxation, an empirical evaluation on image classification datasets assessing the classification performance and calibration is conducted.
Researcher Affiliation Academia Julian Lienen, Eyke H ullermeier Heinz Nixdorf Institute and Department of Computer Science Paderborn University 33098 Paderborn, Germany {julian.lienen,eyke}@upb.de
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper, such as a specific repository link or an explicit code release statement.
Open Datasets Yes Within our study, we consider MNIST (Le Cun et al. 1998), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009) as image datasets.
Dataset Splits Yes Since the parameters α for LR and LS, β for CP, and γ for FL are of critical importance, they have been optimized separately on a separate hold-out validation set consisting of 1/6 of the original training data.
Hardware Specification Yes The runs were conducted on 20 Nvidia RTX 2080 Ti and 10 Nvidia GTX 1080 Ti GPUs.
Software Dependencies No The paper discusses various models and optimizers but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or specific library versions).
Experiment Setup Yes To optimize the models, SGD with a Nesterov momentum of 0.9 has been used as optimizer. In all experiments, the batch size has been fixed to 64. Depending on the model, we set the initial learning rates to 0.01 (VGG), 0.05 (simple dense), and 0.1 (Res Net and Dense Net). For each model, we optimized the learning rate schedule for generalization performance by dividing the learning rate by a constant factor (ranging from 0.1 to 0.1). We trained for either 25 (MNIST), 50 (Fashion-MNIST), 200 (CIFAR-10), or 300 (CIFAR-100) epochs. and For every combination of model and dataset, we empirically determined hyperparameters (such as the learning rate schedule and additional regularization) that work reasonably well for all losses. and we assessed values α {0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4}, β {0.1, 0.3, 0.5, 1, 2, 4, 8}, and γ {0.1, 0.2, 0.5, 1, 2, 3.5, 5} as suggested as reasonable parameters in the corresponding publications.