Localized Zeroth-Order Prompt Optimization

Authors: Wenyang Hu, Yao Shu, Zongmin Yu, Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, See-Kiong Ng, Bryan Kian Hsiang Low

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments.
Researcher Affiliation Collaboration Dept. of Computer Science, National University of Singapore, Republic of Singapore Institute of Data Science, National University of Singapore, Republic of Singapore Guangdong Lab of AI and Digital Economy (SZ) Singapore-MIT Alliance for Research and Technology, Republic of Singapore School of Data Science, The Chinese University of Hong Kong, Shenzhen
Pseudocode Yes Algorithm 1 The ZOPO Algorithm
Open Source Code Yes Our implementation is available at https://github.com/allen4747/ZOPO.
Open Datasets Yes We evaluate the performance of ZOPO against several strong baseline methods... on 30 instruction induction tasks [11], 3 arithmetic reasoning tasks [4, 19, 29], and the GLUE benchmark [42].
Dataset Splits Yes Specifically, 5 examples are sampled from the training set as the demonstrations (i.e., Ddemo) for instruction induction, and another sampled 20 examples from the training set are used as the validation set DV to evaluate the objective function value as in Equation (1).
Hardware Specification Yes All experiments are conducted on a server with Intel(R) Xeon(R) CPU and NVIDIA H100 GPUs.
Software Dependencies No The paper mentions specific models and APIs such as "GPT-3.5-turbo-0301", "Vicuna-13B-v1.1", and "sentence transformer model all-mpnet-base-v2", but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes For all experiments using ZOPO in this work, we set the learning rate to 0.01, the uncertainty thresholds λ, ξ to 0.1 and 5 respectively, and the number n of nearest neighbors to query in local exploration (Section 4.3) to 10. A neural network with 2 fully connected layers of size 32 and Re LU activation functions is used in NTK-GP as the kernel. We use 20 nearest neighbors to fit the NTK-GP.