reproducibilityindex.ai

ReLIZO: Sample Reusable Linear Interpolation-based Zeroth-order Optimization

Authors: Xiaoxing Wang, Xiaohan Qin, Xiaokang Yang, Junchi Yan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both simulation functions and real scenarios (black-box adversarial attacks neural architecture search, and parameterefficient fine-tuning for large language models), show its efficacy and efficiency.
Researcher Affiliation	Academia	Xiaoxing Wang , Xiaohan Qin , Xiaokang Yang, Junchi Yan Dept. of CSE & School of AI & Moe Key Lab of AI, Shanghai Jiao Tong University {figure1_wxx, galaxy-1, xkyang, yanjunchi}@sjtu.edu.cn
Pseudocode	Yes	Algorithm 1 Re LIZO
Open Source Code	Yes	Our code is available at https://github.com/Thinklab-SJTU/Re LIZO.git.
Open Datasets	Yes	First, we test on the CUTEst [24]... Second, we conduct experiments on the black-box adversarial attack task... on the CIFAR-10 dataset... Third, we apply our method to the Neural Architecture Search (NAS) task... on NAS-Bench-201 [18]... Finally, we conduct experiment on large-scale neural network training by fine-tuning large language models (LLMs)... on the Stanford Sentiment Treebank v2 (SST2) task
Dataset Splits	Yes	Table 2: Top-1 test classification accuracy (%) on NAS-Bench-201. The first block shows the performance of gradient-based methods quoted from the paper of NAS-Bench-201. The second block shows the performance of various ZO methods, which are implemented by ourselves on the Py Torch platform. The performance of the methods based on ZO optimizers is averaged over three independent trials.
Hardware Specification	No	The paper mentions running experiments on the 'Py Torch platform' and fine-tuning an 'OPT-1.3b model' with associated memory usage (e.g., '44.1 GB' in Table 4), but it does not specify any particular GPU models, CPU models, or detailed hardware configurations used for these experiments.
Software Dependencies	No	The paper mentions using 'Py Torch platform' and 'Py CUTEst [21]', but does not provide specific version numbers for these software components or any other key dependencies.
Experiment Setup	Yes	Each solver updates the variables 500 times and samples 8 random directions at each iteration to update the variables. We also utilize grid search to obtain the best learning rate for each problem. The candidate learning rate η ranges from {0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5}. As for our method, the total sample size at each iteration is set as 8, and the reusable distance bound b is set as 2η, where η is the learning rate obtained by the grid search.