Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
Authors: Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi3779-3787
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through the experiments in various aspects of knowledge transfer, it is verified that the proposed method outperforms the current state-of-the-art. |
| Researcher Affiliation | Collaboration | Byeongho Heo,1 Minsik Lee,2 Sangdoo Yun,3 Jin Young Choi1{bhheo, jychoi}@snu.ac.kr, mleepaper@hanyang.ac.kr, sangdoo.yun@navercorp.com 1Department of ECE, ASRI, Seoul National University, Korea 2Division of EE, Hanyang University, Korea 3Clova AI Research, NAVER Corp, Korea |
| Pseudocode | No | The paper presents mathematical formulations and descriptive text for its methods but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that source code for the described methodology is publicly available. |
| Open Datasets | Yes | Experiments were performed on the CIFAR10 (Krizhevsky 2009) dataset. ... Using a teacher pre-trained on Image Net (Russakovsky et al. 2015), a randomly initialized student was trained on a target dataset other than Image Net. ... The MIT scenes dataset (Quattoni and Torralba 2009)...and the CUB 2011 dataset (Wah et al. 2011)...were used as target datasets of transfer learning. |
| Dataset Splits | No | The paper mentions "training data" and "test set" extensively (e.g., "Training epochs (initialize + training)" and "error rate(%) on test set" in Table 1), but does not explicitly state details about a separate "validation" dataset split or how it was used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like "Rectified Linear Unit (Re LU)" and refers to "PyTorch" implicitly (a common framework for these models), but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The learning rate scheme for CIFAR-10 started at learning rate 0.1 and divided the learning rate by 5 at 30%, 60%, and 80% of the total training epochs. In transfer learning, the learning rate started at 0.01 and divided by 10 when the training was over half the total epochs. In all cases, Nesterov momentum of 0.9 and a weight decay of 5 10 4 was used. |