Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

Authors: Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi3771-3778

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed method indeed improves knowledge distillation and achieves the stateof-the-arts performance.
Researcher Affiliation Collaboration 1Department of ECE, ASRI, Seoul National University, Korea 2Division of EE, Hanyang University, Korea 3Clova AI Research, NAVER Corp, Korea
Pseudocode No The paper describes an 'Iterative Scheme to find a BSS' with mathematical equations but does not present a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide any concrete statement or link regarding the availability of its source code.
Open Datasets Yes Experiments were performed on the CIFAR-10 (Krizhevsky 2009), Image Net 32 32 (Chrabaszcz, Loshchilov, and Hutter 2017) and Tiny Image Net datasets
Dataset Splits Yes The CIFAR-10...consisting of 50k training images and 10k test images. Image Net 32 32...consisting of 1,281k training images and 50k validation images. Tiny Image Net...It contains 100k training images and 10k test images in 200 classes.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The temperatures of the KD loss and the adversarial loss were fixed to 3 in all experiments. The parameter α in (4) was initialized to 4 and linearly decreased to 1 at the end of training. The β in (4) was set to 2 initially and linearly decreased to 0 at the 75% of the whole training procedure...The learning process was performed with 256 batch size, with a learning rate which started at 0.1 and decreased to 0.01 at half of the maximum epoch and to 0.001 in 3/4 of the maximum epoch. The momentum used in the study was 0.9 and the weight decay was 0.0001. η = 0.3 was used for the adversarial attack in the proposed method and the maximum number of iteration was set to 10 for knowledge distillation. For the boundary supporting loss, Nadv = 64 was selected among 256 batch samples.