Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark

Authors: Mengxi Ya, Yiming Li, Tao Dai, Bin Wang, Yong Jiang, Shu-Tao Xia

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our generalization-limited watermark.
Researcher Affiliation Collaboration Mengxi Ya1 , Yiming Li2,3, , , Tao Dai4, Bin Wang1,5, Yong Jiang1, Shu-Tao Xia1 1Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China 2The State Key Laboratory of Blockchain and Data Security, Zhejiang University, China 3ZJU-Hangzhou Global Scientific and Technological Innovation Center, China 4College of Computer Science and Software Engineering, Shenzhen University, China 5Guangzhou Intelligence Communications Technology Co.,Ltd., China
Pseudocode Yes Algorithm 1 Generating the penalized trigger. Algorithm 2 Watermarking DNNs with BWTP. Algorithm 3 Generating the desired synthesized trigger. Algorithm 4 Watermarking DNNs with GLBW.
Open Source Code Yes Our codes are available at https://github.com/yamengxi/GLBW.
Open Datasets Yes We adopt Res Net-18 (He et al., 2016) on CIFAR-10 (Krizhevsky, 2009) and GTSRB (Stallkamp et al., 2012) datasets for our discussions.
Dataset Splits No The paper mentions using a "benign training set" and "100 testing images for evaluation" (for SRV methods), but it does not specify any explicit dataset splits for training, validation, and testing of the models themselves (e.g., percentages or counts for each split).
Hardware Specification Yes We train each model with one NVIDIA RTX3090 GPU.
Software Dependencies No The paper mentions using "Python toolbox Backdoor Box" and the "SGD optimizer" but does not specify version numbers for Python, deep learning frameworks (like PyTorch or TensorFlow), or other key software libraries used in the experiments.
Experiment Setup Yes Specifically, the target label is set to 0 and the watermarking rate is set to 5%. ... we simply set λ1 = λ2 = 1 for BWTP and λ3 = λ4 = 1 for our GLBW. ...the mini-batch is set to 128 for each iteration. We use the SGD optimizer by setting momentum=0.9 and weight decay=5e-4. For CIFAR-10, the number of the training epochs is 200 and the learning rate is initialized as 0.1 and multiplied by 0.1 at the 150 and 180 epoch, respectively. For GTSRB, the number of the training epochs is 30 and the learning rate is initialized as 0.01 and multiplied by 0.1 at the 20th epoch.