How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Authors: Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jinfeng Yi, Mingyi Hong, Shiyu Chang, Sijia Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate the effectiveness of our proposal through extensive experiments. We will show that the proposed ZO-AE-DS outperforms a series of baselines when robustifying black-box neural networks for secure image classification and image reconstruction.
Researcher Affiliation Collaboration Yimeng Zhang Michigan State University Yuguang Yao Michigan State University Jinghan Jia Michigan State University Jinfeng Yi JD AI Research Mingyi Hong University of Minnesota Shiyu Chang UC Santa Barbara Sijia Liu Michigan State University
Pseudocode No The paper describes methods and derivations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Codes are available at https://github.com/damon-demon/Black-Box-Defense.
Open Datasets Yes In the task of image classification, we focus on CIFAR-10 and STL-10 datasets. In the task of image reconstruction, we consider the MNIST dataset.
Dataset Splits No The paper mentions using CIFAR-10, STL-10, and MNIST datasets but does not explicitly provide the training, validation, and test dataset splits (e.g., percentages or counts) or refer to specific predefined splits with citations for all datasets used.
Hardware Specification Yes The averaged one-epoch training time on a single Nvidia RTX A6000 GPU is about 1min and 29min for FO-DS and our proposed ZO method, ZO-AE-DS (CGE, q = 192), on the CIFAR-10 dataset.
Software Dependencies No The paper mentions optimizers like 'Adam optimizer' and 'SGD optimizer' but does not specify version numbers for these or any other software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes We use Adam optimizer with learning rate 10 3 to train the model for 200 epochs and then use SGD optimizer with learning rate 10 3 drop by a factor of 10 at every 200 epoch, where the total number of epochs is 600. Furthermore, we set the smoothing parameter µ = 0.005 for RGE and CGE. And to achieve a smooth predictor, we set the Gaussian smoothing noise as δ N(0, σ2I) with σ2 = 0.25.