TEMPERA: Test-Time Prompt Editing via Reinforcement Learning
Authors: Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that TEMPERA can achieve So TA performance (e.g., 1.8% better in SST-2 and 3.9% better in CR) compared to few-shot finetuning, prompt tuning and discrete prompt optimization. We also show that TEMPERA is on 4x more data efficient (over the average of 4 tasks SST2, MR, AG News and RTE) compared with traditional finetuning methods (Figure 1). In addition, we perform extensive ablations on different aspects of the proposed algorithm. |
| Researcher Affiliation | Collaboration | Tianjun Zhang1 Xuezhi Wang2 Denny Zhou2 Dale Schuurmans2, 3 Joseph E. Gonzalez1 1 UC Berkeley 2 Google Research, Brain Team 3 University of Alberta |
| Pseudocode | Yes | Algorithm 1 Test-Time Prompt Editing with TEMPERA |
| Open Source Code | Yes | Our code is available at https://github.com/tianjunz/TEMPERA. |
| Open Datasets | Yes | Most of the tasks are from the standard GLUE (Wang et al., 2018). ... We test TEMPERA on few-shot text classification tasks... including single-sentence tasks (e.g., sentiment analysis including SST-2, Yelp reviews, MR, CR, topic classification including AG News). |
| Dataset Splits | Yes | We also randomly sample 16 samples per class as the validation dataset. |
| Hardware Specification | No | The paper mentions the use of "Ro BERTalarge" for the language model, but it does not specify any hardware components like CPU or GPU models, or details about the computing environment used for experiments. |
| Software Dependencies | No | The paper mentions software like "PPO algorithm", "NLTK", and "huggingface", but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Table 8: Hyperparameters used for TEMPERA in all the tasks. (e.g., Steps per training 8, Learning rate 0.00005, Gamma 0.99). For the Finetuning, we use standard finetuning of the Ro BERTa model from huggingface for 100 epochs, a learning rate of 0.0003 and the optimizer of Adam. |