ReLIZO: Sample Reusable Linear Interpolation-based Zeroth-order Optimization

Authors: Xiaoxing Wang, Xiaohan Qin, Xiaokang Yang, Junchi Yan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both simulation functions and real scenarios (black-box adversarial attacks neural architecture search, and parameterefficient fine-tuning for large language models), show its efficacy and efficiency.
Researcher Affiliation Academia Xiaoxing Wang , Xiaohan Qin , Xiaokang Yang, Junchi Yan Dept. of CSE & School of AI & Moe Key Lab of AI, Shanghai Jiao Tong University {figure1_wxx, galaxy-1, xkyang, yanjunchi}@sjtu.edu.cn
Pseudocode Yes Algorithm 1 Re LIZO
Open Source Code Yes Our code is available at https://github.com/Thinklab-SJTU/Re LIZO.git.
Open Datasets Yes First, we test on the CUTEst [24]... Second, we conduct experiments on the black-box adversarial attack task... on the CIFAR-10 dataset... Third, we apply our method to the Neural Architecture Search (NAS) task... on NAS-Bench-201 [18]... Finally, we conduct experiment on large-scale neural network training by fine-tuning large language models (LLMs)... on the Stanford Sentiment Treebank v2 (SST2) task
Dataset Splits Yes Table 2: Top-1 test classification accuracy (%) on NAS-Bench-201. The first block shows the performance of gradient-based methods quoted from the paper of NAS-Bench-201. The second block shows the performance of various ZO methods, which are implemented by ourselves on the Py Torch platform. The performance of the methods based on ZO optimizers is averaged over three independent trials.
Hardware Specification No The paper mentions running experiments on the 'Py Torch platform' and fine-tuning an 'OPT-1.3b model' with associated memory usage (e.g., '44.1 GB' in Table 4), but it does not specify any particular GPU models, CPU models, or detailed hardware configurations used for these experiments.
Software Dependencies No The paper mentions using 'Py Torch platform' and 'Py CUTEst [21]', but does not provide specific version numbers for these software components or any other key dependencies.
Experiment Setup Yes Each solver updates the variables 500 times and samples 8 random directions at each iteration to update the variables. We also utilize grid search to obtain the best learning rate for each problem. The candidate learning rate η ranges from {0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5}. As for our method, the total sample size at each iteration is set as 8, and the reusable distance bound b is set as 2η, where η is the learning rate obtained by the grid search.