Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels
Authors: Hyeonsu Jeong, Hye Won Chung
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings are supported by experiments on synthetic and real datasets (Figure 3 and Section 6). In this section, we validate our theoretical findings through experiments on real datasets. |
| Researcher Affiliation | Academia | Hyeonsu Jeong & Hye Won Chung School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Daejeon, South Korea EMAIL |
| Pseudocode | Yes | Algorithm 1: Optimal Output Calculation by Numerical Method |
| Open Source Code | Yes | Our code is publicly available at https://github.com/Hyeonsu-Jeong/Self-PLL. |
| Open Datasets | Yes | We conduct experiments on six multi-class image classification benchmarks: CIFAR-100 (Krizhevsky et al., 2009), Caltech-101/256 (Griffin et al., 2007), Flowers-102 (Nilsback & Zisserman, 2008), Food-101 (Bossard et al., 2014), and Stanford Cars (Krause et al., 2013), utilizing the PyTorch torchvision library. |
| Dataset Splits | Yes | For each category, there are about 40 to 800 samples, with an average of 50 samples per category. We removed the background category and divided the dataset into training and validation sets using an 8:2 ratio. ... Similar to the Caltech-101 dataset, we removed the clutter category and divided the dataset into training and validation sets using an 8:2 ratio. ... Since the test set of the Flowers-102 dataset is larger than the training set, we swapped the training and test sets for use. |
| Hardware Specification | Yes | Our neural networks are trained using multiple NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch torchvision library and SGD optimizer, but does not specify their version numbers. It also mentions using Generalized Cross Entropy(GCE) loss with hyperparameter q = 0.7, but this is a method, not a software dependency. |
| Experiment Setup | Yes | We perform a grid search over learning rates, in the set {0.1, 0.05, 0.01, 0.005, 0.001}. Each model is trained for 200 epochs, employing the SGD optimizer with a momentum value of 0.9. In our experiments, we observe that using CE loss with the PLL student model often leads to instability during training. The PLL student model trains with a set of candidate labels for each sample with equal weights in our case, the top two labels with weights of 1/2 each. Using CE loss with equally weighted candidate labels can cause instability since the model may converge incorrectly when the candidate set includes incorrect labels. Hence, for the stable convergence of the PLL student model, we use Generalized Cross Entropy(GCE) (Zhang & Sabuncu, 2018) loss with the hyperparameter q = 0.7. |