Localized Zeroth-Order Prompt Optimization
Authors: Wenyang Hu, Yao Shu, Zongmin Yu, Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, See-Kiong Ng, Bryan Kian Hsiang Low
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments. |
| Researcher Affiliation | Collaboration | Dept. of Computer Science, National University of Singapore, Republic of Singapore Institute of Data Science, National University of Singapore, Republic of Singapore Guangdong Lab of AI and Digital Economy (SZ) Singapore-MIT Alliance for Research and Technology, Republic of Singapore School of Data Science, The Chinese University of Hong Kong, Shenzhen |
| Pseudocode | Yes | Algorithm 1 The ZOPO Algorithm |
| Open Source Code | Yes | Our implementation is available at https://github.com/allen4747/ZOPO. |
| Open Datasets | Yes | We evaluate the performance of ZOPO against several strong baseline methods... on 30 instruction induction tasks [11], 3 arithmetic reasoning tasks [4, 19, 29], and the GLUE benchmark [42]. |
| Dataset Splits | Yes | Specifically, 5 examples are sampled from the training set as the demonstrations (i.e., Ddemo) for instruction induction, and another sampled 20 examples from the training set are used as the validation set DV to evaluate the objective function value as in Equation (1). |
| Hardware Specification | Yes | All experiments are conducted on a server with Intel(R) Xeon(R) CPU and NVIDIA H100 GPUs. |
| Software Dependencies | No | The paper mentions specific models and APIs such as "GPT-3.5-turbo-0301", "Vicuna-13B-v1.1", and "sentence transformer model all-mpnet-base-v2", but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | For all experiments using ZOPO in this work, we set the learning rate to 0.01, the uncertainty thresholds λ, ξ to 0.1 and 5 respectively, and the number n of nearest neighbors to query in local exploration (Section 4.3) to 10. A neural network with 2 fully connected layers of size 32 and Re LU activation functions is used in NTK-GP as the kernel. We use 20 nearest neighbors to fit the NTK-GP. |