Self-Distillation as Instance-Specific Label Smoothing
Authors: Zhilu Zhang, Mert Sabuncu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experimental results using multiple datasets and neural network architectures that, overall, demonstrate the utility of predictive diversity. |
| Researcher Affiliation | Academia | Zhilu Zhang Cornell University zz452@cornell.edu Mert R. Sabuncu Cornell Univerisity msabuncu@cornell.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a link to open-source code for the described methodology. |
| Open Datasets | Yes | We conduct experiments on CIFAR-100 [20], CUB-200 [37] and Tiny-imagenet [9] using Res Net [13] and Dense Net [16]. |
| Dataset Splits | Yes | 10% of the training data is split as the validation set. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software components or libraries. |
| Experiment Setup | Yes | We follow the original optimization configurations, and train the Res Net models for 150 epochs and Dese Net models for 200 epochs. ... We fix = 0.15 in label smoothing for all our experiments ... The hyper-parameter of Eq. 3 is taken to be 0.6 for self-distillation. |