reproducibilityindex.ai

Exploring Model Dynamics for Accumulative Poisoning Discovery

Authors: Jianing Zhu, Xiawei Guo, Jiangchao Yao, Chao Du, Li He, Shuo Yuan, Tongliang Liu, Liang Wang, Bo Han

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments comprehensively characterized Memorization Discrepancy and verified its effectiveness. The code is publicly available at: https://github.com/tmlr-group/ Memorization-Discrepancy. We conduct extensive experiments to comprehensively characterize the Memorization Discrepancy, and verify the effectiveness of DSC in improving the model robustness against accumulative poisoning attacks using a range of benchmarked datasets. (in Sections 4)
Researcher Affiliation	Collaboration	1Department of Computer Science, Hong Kong Baptist University 2Alibaba Group 3CMIC, Shanghai Jiao Tong University 4Shanghai AI Laboratory 5Mohamed bin Zayed University of Artificial Intelligence 6Sydney AI Centre, The University of Sydney.
Pseudocode	Yes	Algorithm 1 Discrepancy-aware Sample Correction (DSC)
Open Source Code	Yes	The code is publicly available at: https://github.com/tmlr-group/ Memorization-Discrepancy.
Open Datasets	Yes	Following Pang et al. (2021), we simulate the real-time data streaming using the SVHN (Netzer et al., 2011), CIFAR-10 and CIFAR-100 (Krizhevsky, 2009) datasets.
Dataset Splits	No	The paper mentions a 'validation set' (Sval) but does not provide specific details on the dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	Yes	All experiments are conducted with multiple runs on NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions using ResNet-18 and SGD optimizer but does not specify versions for any ancillary software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	Same as (Pang et al., 2021), we train Res Net-18 (He et al., 2016) using the SGD optimizer with the learning rate 0.1, momentum 0.9, and weight decay 0.0001. During the whole process, we keep the batchsize of data streaming at 100. The first phase is named burn-in phase, like model pre-training, the model will be trained on natural data before taking the training examples from other untrusted sources (Biggio & Roli, 2018). ... model is pre-trained for 40 epochs. Specifically, the crafted sample is generated by PGD under the ℓ -norm constraint. For the threshold schedule, we set µ = 0.5, τ = 0.02 for both CIFAR-10 and SVHN datasets, and µ = 1.7, τ = 0.1 for CIFAR-100 dataset.