Decoupled Kullback-Leibler Divergence Loss
Authors: Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard Robust Bench and competitive performance on knowledge distillation, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL. |
| Researcher Affiliation | Collaboration | Jiequan Cui1 Zhuotao Tian4 Zhisheng Zhong2 Xiaojuan Qi3 Bei Yu2 Hanwang Zhang1 Nanyang Technological University1 The Chinese University of Hong Kong2 The University of Hong Kong3 HIT(SZ)4 |
| Pseudocode | Yes | Algorithm 1 Pseudo code for DKL/IKL loss in Pytorch style. Algorithm 2 Memory efficient implementation for w MSE_loss in Pytorch style. |
| Open Source Code | Yes | Our code is available at https://github.com/jiequancui/DKL. |
| Open Datasets | Yes | We evaluate its effectiveness by conducting experiments on CIFAR-10/100 and Image Net datasets... All the datasets we considered are publicly available, we list their licenses and URLs as follows: CIFAR-10 [41]: MIT License, https://www.cs.toronto.edu/~kriz/cifar.html. CIFAR-100 [41]: MIT License, https://www.cs.toronto.edu/~kriz/cifar.html. Image Net [54]: Non-commercial, http://image-net.org. |
| Dataset Splits | Yes | Table 4: Top-1 accuracy (%) on the Image Net validation and training speed (sec/iteration) comparisons. |
| Hardware Specification | Yes | Models are trained with 4 Nvidia Ge Force 3090 GPUs. |
| Software Dependencies | No | The paper mentions 'Pytorch style' in its pseudocode but does not specify version numbers for Python, PyTorch, CUDA, or other key libraries used in the experiments. |
| Experiment Setup | Yes | We use an improved version of TRADES [71] as our baseline, which incorporates AWP [66] and adopts an increasing epsilon schedule. SGD optimizer with a momentum of 0.9 is used. We use the cosine learning rate strategy with an initial learning rate of 0.2 and train models 200 epochs. The batch size is 128, the weight decay is 5e-4 and the perturbation size ϵ is set to 8/255. |