Understanding prompt engineering may not require rethinking generalization

Authors: Victor Akinwande, Yiding Jiang, Dylan Sam, J Zico Kolter

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically that this holds for existing handcrafted prompts and prompts generated through simple greedy search. ... 5 EXPERIMENTS In this section, we evaluate the generalization of discrete prompts generated by Greedy on CIFAR10, CIFAR-100, Image Net as well as domain generalization datasets f Mo W (Christie et al., 2018) and Office Home (Venkateswara et al., 2017), which is much less studied in the context of numerical generalization bounds.
Researcher Affiliation Collaboration Victor Akinwande1, Yiding Jiang1, Dylan Sam1 & J. Zico Kolter1,2 1Carnegie Mellon University, 2Bosch Center for AI
Pseudocode Yes A PESUDOCODE Algorithm 1 Sequential Prompt Search
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the methodology described is openly available.
Open Datasets Yes In this section, we evaluate the generalization of discrete prompts generated by Greedy on CIFAR10, CIFAR-100, Image Net as well as domain generalization datasets f Mo W (Christie et al., 2018) and Office Home (Venkateswara et al., 2017)
Dataset Splits No The paper describes using a 'split portion of the dataset s {0.1, . . . , 1.0}' for its experiments and mentions training and testing data, but it does not explicitly define or specify a separate 'validation' dataset split with percentages or counts for reproduction.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running its experiments.
Software Dependencies No The paper mentions software components like CLIP and LLaMA-7B (Touvron et al., 2023), but it does not provide specific version numbers for multiple key software libraries, frameworks, or programming languages used to run the experiments.
Experiment Setup Yes C EXPERIMENTAL DETAILS Hyperparameters We report the hyperparameters used in CLIP, LLa MA-7b, and the Greedy algorithm in Table 4. Table 4: Hyperparameters used in CLIP, LLa MA-7b and Greedy. Hyperparameter Value Batch size 100 CLIP Vocabulary size 49,408 LLa MA-7B Vocabulary size 32,000 Temperature 1.0 Bound δ 0.01 SRM β 1.0