Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Localized Zeroth-Order Prompt Optimization
Authors: Wenyang Hu, Yao Shu, Zongmin Yu, Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, See-Kiong Ng, Bryan Kian Hsiang Low
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments. |
| Researcher Affiliation | Collaboration | Dept. of Computer Science, National University of Singapore, Republic of Singapore Institute of Data Science, National University of Singapore, Republic of Singapore Guangdong Lab of AI and Digital Economy (SZ) Singapore-MIT Alliance for Research and Technology, Republic of Singapore School of Data Science, The Chinese University of Hong Kong, Shenzhen |
| Pseudocode | Yes | Algorithm 1 The ZOPO Algorithm |
| Open Source Code | Yes | Our implementation is available at https://github.com/allen4747/ZOPO. |
| Open Datasets | Yes | We evaluate the performance of ZOPO against several strong baseline methods... on 30 instruction induction tasks [11], 3 arithmetic reasoning tasks [4, 19, 29], and the GLUE benchmark [42]. |
| Dataset Splits | Yes | Specifically, 5 examples are sampled from the training set as the demonstrations (i.e., Ddemo) for instruction induction, and another sampled 20 examples from the training set are used as the validation set DV to evaluate the objective function value as in Equation (1). |
| Hardware Specification | Yes | All experiments are conducted on a server with Intel(R) Xeon(R) CPU and NVIDIA H100 GPUs. |
| Software Dependencies | No | The paper mentions specific models and APIs such as "GPT-3.5-turbo-0301", "Vicuna-13B-v1.1", and "sentence transformer model all-mpnet-base-v2", but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | For all experiments using ZOPO in this work, we set the learning rate to 0.01, the uncertainty thresholds Ξ», ΞΎ to 0.1 and 5 respectively, and the number n of nearest neighbors to query in local exploration (Section 4.3) to 10. A neural network with 2 fully connected layers of size 32 and Re LU activation functions is used in NTK-GP as the kernel. We use 20 nearest neighbors to fit the NTK-GP. |