reproducibilityindex.ai

Decoupled Kullback-Leibler Divergence Loss

Authors: Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard Robust Bench and competitive performance on knowledge distillation, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.
Researcher Affiliation	Collaboration	Jiequan Cui1 Zhuotao Tian4 Zhisheng Zhong2 Xiaojuan Qi3 Bei Yu2 Hanwang Zhang1 Nanyang Technological University1 The Chinese University of Hong Kong2 The University of Hong Kong3 HIT(SZ)4
Pseudocode	Yes	Algorithm 1 Pseudo code for DKL/IKL loss in Pytorch style. Algorithm 2 Memory efficient implementation for w MSE_loss in Pytorch style.
Open Source Code	Yes	Our code is available at https://github.com/jiequancui/DKL.
Open Datasets	Yes	We evaluate its effectiveness by conducting experiments on CIFAR-10/100 and Image Net datasets... All the datasets we considered are publicly available, we list their licenses and URLs as follows: CIFAR-10 [41]: MIT License, https://www.cs.toronto.edu/~kriz/cifar.html. CIFAR-100 [41]: MIT License, https://www.cs.toronto.edu/~kriz/cifar.html. Image Net [54]: Non-commercial, http://image-net.org.
Dataset Splits	Yes	Table 4: Top-1 accuracy (%) on the Image Net validation and training speed (sec/iteration) comparisons.
Hardware Specification	Yes	Models are trained with 4 Nvidia Ge Force 3090 GPUs.
Software Dependencies	No	The paper mentions 'Pytorch style' in its pseudocode but does not specify version numbers for Python, PyTorch, CUDA, or other key libraries used in the experiments.
Experiment Setup	Yes	We use an improved version of TRADES [71] as our baseline, which incorporates AWP [66] and adopts an increasing epsilon schedule. SGD optimizer with a momentum of 0.9 is used. We use the cosine learning rate strategy with an initial learning rate of 0.2 and train models 200 epochs. The batch size is 128, the weight decay is 5e-4 and the perturbation size ϵ is set to 8/255.