Genetic Prompt Search via Exploiting Language Model Probabilities

Authors: Jiangjiang Zhao, Zhuoran Wang, Fangchun Yang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on diverse benchmark datasets show that the proposed precondition-free method significantly outperforms the existing DFO-style counterparts that require preconditions, including blackbox tuning, genetic prompt search and gradientfree instructional prompt search.
Researcher Affiliation Collaboration Jiangjiang Zhao1,2 , Zhuoran Wang3 , Fangchun Yang1 1Beijing University of Posts and Telecommunications, P.R. China 2China Mobile Online Services Co., Ltd. Beijing, P.R. China 3Clouchie Limited, London, United Kingdom
Pseudocode Yes Algorithm 1 gives the pseudo-code of the proposed GAP3, where hyperparameters and constant objects are denoted in italic type.
Open Source Code Yes 1Code and supplementary material available at: https://github. com/zjjhit/gap3
Open Datasets Yes The datasets used in the main experiments consist of 7 benchmark NLP tasks, which are the same as in [Sun et al., 2022b], including Yelp polarity, AG s News and DBPedia from [Zhang et al., 2015], SST-2, MRPC and RTE from the GLUE benchmarks [Wang et al., 2018], as well as SNLI [Bowman et al., 2015].
Dataset Splits No The paper describes the creation of k-shot training sets and the use of original test sets or development sets as test sets, but does not explicitly define a separate validation set for the main model training.
Hardware Specification No The paper mentions 'computing power' in the acknowledgements but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies No The paper mentions the use of various pretrained language models and optimizers but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We set GAP3 s population size N = 64 and iteration number M = 50, with crossover and mutation probabilities ρc = 0.5 and ρm = 0.75, respectively. For PT, with learning rate 5e-4 and batch size 16, it runs for 1000 epochs. For full-model FT, with the same batch size, but learning rate 1e-5, we run it for 200 epochs.