reproducibilityindex.ai

Zeroth-Order Optimization with Trajectory-Informed Derivative Estimation

Authors: Yao Shu, Zhongxiang Dai, Weicong Sng, Arun Verma, Patrick Jaillet, Bryan Kian Hsiang Low

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we use extensive experiments, such as black-box adversarial attack, non-differentiable metric optimization, and derivative-free reinforcement learning, to demonstrate that (a) our trajectory-informed derivative estimation improves over the existing FD methods and that (b) our ZORD algorithm consistently achieves improved query efficiency compared with previous ZO optimization algorithms (Sec. 5).
Researcher Affiliation	Academia	Dept. of Computer Science, National University of Singapore, Republic of Singapore, Dept. of Electrical Engineering and Computer Science, MIT, USA
Pseudocode	Yes	Algorithm 1: Standard (Projected) GD with Estimated Derivatives, Algorithm 2: ZORD (Ours)
Open Source Code	Yes	For our empirical results, we have provided our detailed experimental settings in Appx. C and included our codes in the supplementary materials (i.e., the zip file).
Open Datasets	Yes	we randomly select an image from MNIST (Lecun et al., 1998) (d = 28 28) or CIFAR-10 (Krizhevsky et al., 2009) (d = 32 32), The Covertype dataset used in Sec. 5.4 is a classification dataset consisting of 581,012 samples from 7 different categories. Each sample from this dataset is a 54-dimensional vector of integers. In this experiment, we randomly split the dataset into training and test sets with each containing 290,506 samples.
Dataset Splits	No	No explicit mention of training/validation/test dataset splits was found. For the Covertype dataset, it mentions a random split into 'training and test sets with each containing 290,506 samples', but no explicit validation set is specified.
Hardware Specification	No	No specific hardware specifications (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments were provided in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) were mentioned in the paper.
Experiment Setup	Yes	Among all our experiments in Sec. 5, the confidence threshold c of our dynamic virtual updates (Sec. 3.2) is set to be 0.35, we consistently use n = 10, λ = 0.01 and directions {ui}n i=1 that are randomly sampled from a unit sphere for the derivative estimation of the FD method (2) applied in the RGF and PRGF algorithm., We use the same Adam optimizer (Kingma and Ba, 2015) with a learning rate of 0.1 and exponential decay rates of 0.9, 0.999 for RGF, PRGF, GD, and our ZORD algorithm, Adam optimizer with the same learning rate of 0.5 and the same exponential decay rates of 0.9, 0.999.