NEO-KD: Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks
Authors: Seokil Ham, Jungwuk Park, Dong-Jun Han, Jaekyun Moon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on various datasets/models show that our method achieves the best adversarial accuracy with reduced computation budgets, compared to the baselines relying on existing adversarial training or knowledge distillation techniques for multi-exit networks. |
| Researcher Affiliation | Academia | Seokil Ham1 Jungwuk Park1 Dong-Jun Han2 Jaekyun Moon1 1KAIST 2Purdue University {gkatjrdlf, savertm}@kaist.ac.kr, han762@purdue.edu, jmoon@kaist.edu |
| Pseudocode | No | The paper describes the algorithm using text and mathematical equations but does not include a formally labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper cites external GitHub repositories for baseline models used in experiments, but does not provide a link or explicit statement for the open-source release of its own methodology (NEO-KD) code. |
| Open Datasets | Yes | In this section, we evaluate our method on five datasets commonly adopted in multi-exit networks: MNIST [18], CIFAR-10, CIFAR-100 [16], Tiny-Image Net [17], and Image Net [25]. |
| Dataset Splits | Yes | We provide a detailed explanation about how to determine confidence threshold for each exit using validation set before the testing phase. First, in order to obtain confidence thresholds for various budget scenarios, we allocate the number of validation samples for each exit. |
| Hardware Specification | Yes | All experiments are implemented with two RTX3090 GPUs. |
| Software Dependencies | No | The paper mentions `PyTorch` implicitly through a linked GitHub repository, but does not provide specific version numbers for PyTorch, Python, or any other software dependencies. |
| Experiment Setup | Yes | We train a Small CNN [12] for 150 epochs with batch size 128 on MNIST [18], a MSDNet [13] for 150 epochs with batch size 128 on CIFAR-10/100 [16] and Tiny-Image Net [17]. For MNIST, we use Small CNN [12] with 3 exits. We trained the MSDNet [13] with 3 and 7 exits using CIFAR-10 and CIFAR-100, respectively. For Tiny-Image Net and Image Net, we trained the MSDNet with 5 exits. For the optimizer, SGD is used with a momentum of 0.9 and a weight decay of 5 10 4. For the MNIST dataset, the initial learning rate is set to 0.01 and the learning rate is decayed 10-fold at 50 epoch. For the CIFAR-10/100 dataset, the initial learning rate is 0.1, and is decayed 10-fold at 75-th epoch and 115-th epoch. For Tiny-Image Net, the initial learning rate is set to 0.1, and is decayed 10-fold at 50-th epoch and 100-th epoch. For Image Net, the learning rate is constant with 0.001. During adversarial training, we use max-average attack and average attack [12] for generating adversarial examples via PGD attacker algorithm [21] with 7-steps, while the PGD attacker algorithm with 50-step is used for measuring robustness against a stronger attack during test time. ... the perturbation degree ϵ is 0.3 for MNIST, and 8/255 for CIFAR-10/100, and 2/255 for Tiny-Image Net/Image Net datasets during adversarial training and when measuring the adversarial test accuracy. ... The step size δ is set to 20/255 for MNIST, 2/255 for CIFAR-10/100, and 2 3ϵ (0.0052) for Tiny Image Net/Image Net. The number of iterations is commonly 50-steps. Additionally, the hyperparameter α for NKD is set to 3, and β for EOKD is set to 1 across all experiments. On the other hand, the exit-balancing parameter γ is set to [1, 1, 1] for MNIST and CIFAR-10, and [1, 1, 1, 1.5, 1.5], [1, 1, 1, 1.5, 1.5, 1.5, 1.5] for Tiny-Image Net/Image Net, CIFAR-100, respectively. |