Learning to Learn by Zeroth-Order Oracle

Authors: Yangjun Ruan, Yuanhao Xiong, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our learned optimizer outperforms hand-designed algorithms in terms of convergence rate and final solution on both synthetic and practical ZO optimization problems (in particular, the black-box adversarial attack task, which is one of the most widely used applications of ZO optimization). We finally conduct extensive analytical experiments to demonstrate the effectiveness of our proposed optimizer.1
Researcher Affiliation Collaboration Yangjun Ruan1, Yuanhao Xiong2, Sashank Reddi3, Sanjiv Kumar3, Cho-Jui Hsieh2,3 1Department of Infomation Science and Electrical Engineering, Zhejiang University 2Department of Computer Science, UCLA 3Google Research
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/RYoung J/ZO-L2L
Open Datasets Yes We follow the same neural network architectures used in Cheng et al. (2019) for MNIST and CIFAR-10 dataset
Dataset Splits No The paper mentions training and testing splits for the optimizer's training (e.g., 'randomly select 100 images... to train the optimizer and select another 100 images to test the learned optimizer'), but it does not specify a separate validation set for hyperparameter tuning during this process.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For each task, we tune the hyperparameters of baseline algorithms to report the best performance. Specifically, we set the learning rate of baseline algorithms to δ/d. We first coarsely tune the constant δ on a logarithmic range {0.01, 0.1, 1, 10, 100, 1000} and then finetune it on a linear range. For ZO-ADAM, we tune β1 values over {0.9, 0.99} and β2 values over {0.99, 0.996, 0.999}. In all experiments, we use 1-layer LSTM with 10 hidden units for both the Update RNN and the Query RNN. For each RNN, another linear layer is applied to project the hidden state to the output (1-dim parameter update for the Update RNN and 1-dim predicted variance for the Query RNN). The regularization parameter λ in the training objective function (equation 4) is set to 0.005. We use ADAM to train our proposed optimizer with truncated BPTT, each optimization is run for 200 steps and unrolled for 20 steps unless specified otherwise. At test time, we set the Bernoulli random variable (see Section 3.1) X Ber(0.5).