Knowledge Distillation with Adversarial Samples Supporting Decision Boundary
Authors: Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi3771-3778
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the proposed method indeed improves knowledge distillation and achieves the stateof-the-arts performance. |
| Researcher Affiliation | Collaboration | 1Department of ECE, ASRI, Seoul National University, Korea 2Division of EE, Hanyang University, Korea 3Clova AI Research, NAVER Corp, Korea |
| Pseudocode | No | The paper describes an 'Iterative Scheme to find a BSS' with mathematical equations but does not present a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any concrete statement or link regarding the availability of its source code. |
| Open Datasets | Yes | Experiments were performed on the CIFAR-10 (Krizhevsky 2009), Image Net 32 32 (Chrabaszcz, Loshchilov, and Hutter 2017) and Tiny Image Net datasets |
| Dataset Splits | Yes | The CIFAR-10...consisting of 50k training images and 10k test images. Image Net 32 32...consisting of 1,281k training images and 50k validation images. Tiny Image Net...It contains 100k training images and 10k test images in 200 classes. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The temperatures of the KD loss and the adversarial loss were fixed to 3 in all experiments. The parameter α in (4) was initialized to 4 and linearly decreased to 1 at the end of training. The β in (4) was set to 2 initially and linearly decreased to 0 at the 75% of the whole training procedure...The learning process was performed with 256 batch size, with a learning rate which started at 0.1 and decreased to 0.01 at half of the maximum epoch and to 0.001 in 3/4 of the maximum epoch. The momentum used in the study was 0.9 and the weight decay was 0.0001. η = 0.3 was used for the adversarial attack in the proposed method and the maximum number of iteration was set to 10 for knowledge distillation. For the boundary supporting loss, Nadv = 64 was selected among 256 batch samples. |