From Label Smoothing to Label Relaxation
Authors: Julian Lienen, Eyke Hüllermeier8583-8591
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of the approach is demonstrated in an empirical study on image data. and To demonstrate the effectiveness of label relaxation, an empirical evaluation on image classification datasets assessing the classification performance and calibration is conducted. |
| Researcher Affiliation | Academia | Julian Lienen, Eyke H ullermeier Heinz Nixdorf Institute and Department of Computer Science Paderborn University 33098 Paderborn, Germany {julian.lienen,eyke}@upb.de |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper, such as a specific repository link or an explicit code release statement. |
| Open Datasets | Yes | Within our study, we consider MNIST (Le Cun et al. 1998), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009) as image datasets. |
| Dataset Splits | Yes | Since the parameters α for LR and LS, β for CP, and γ for FL are of critical importance, they have been optimized separately on a separate hold-out validation set consisting of 1/6 of the original training data. |
| Hardware Specification | Yes | The runs were conducted on 20 Nvidia RTX 2080 Ti and 10 Nvidia GTX 1080 Ti GPUs. |
| Software Dependencies | No | The paper discusses various models and optimizers but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | To optimize the models, SGD with a Nesterov momentum of 0.9 has been used as optimizer. In all experiments, the batch size has been fixed to 64. Depending on the model, we set the initial learning rates to 0.01 (VGG), 0.05 (simple dense), and 0.1 (Res Net and Dense Net). For each model, we optimized the learning rate schedule for generalization performance by dividing the learning rate by a constant factor (ranging from 0.1 to 0.1). We trained for either 25 (MNIST), 50 (Fashion-MNIST), 200 (CIFAR-10), or 300 (CIFAR-100) epochs. and For every combination of model and dataset, we empirically determined hyperparameters (such as the learning rate schedule and additional regularization) that work reasonably well for all losses. and we assessed values α {0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4}, β {0.1, 0.3, 0.5, 1, 2, 4, 8}, and γ {0.1, 0.2, 0.5, 1, 2, 3.5, 5} as suggested as reasonable parameters in the corresponding publications. |