Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL
Authors: Hao Sun, Alihan Hüyük, Mihaela van der Schaar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluations across various LLM scales and arithmetic reasoning datasets underscore both the efficacy and economic viability of the proposed approach. |
| Researcher Affiliation | Academia | Hao Sun , Alihan H uy uk, Mihaela van der Schaar DAMTP, University of Cambridge |
| Pseudocode | No | The paper describes the steps of its proposed solution (Prompt-OIRL) in detail but does not present them in a formalized pseudocode or algorithm block format. |
| Open Source Code | Yes | Code is available at: https://github.com/vanderschaarlab/Prompt-OIRL |
| Open Datasets | Yes | Tasks We use the tasks of Multi Arith (Roy & Roth, 2016), GSM8K (Cobbe et al., 2021a), SVAMP (Patel et al., 2021) in the arithmetic reasoning domain because they are widely studied in zero-shot prompting, and hence rich expert-crafted and machine-generated prompting knowledge is available. [...] All created offline demonstration datasets, including the query-prompt pairs, prompted answers from different LLMs, and the correctness of those answers will be released as a publicly accessible dataset. |
| Dataset Splits | No | The paper specifies training and testing splits for the datasets but does not explicitly detail a separate validation set with specific sizes or percentages for hyperparameter tuning. |
| Hardware Specification | Yes | With our implementation, conducting OIRL for the GSM8k takes 50 minutes on a Mac Book Air with an 8-core M2 chip, and takes only 5 minutes on a server with 16(out of 64)-core AMD 3995WX CPUs. [...] LLa MA2-7B-chat, which operated locally on an NVIDIA A4000 GPU. |
| Software Dependencies | No | The paper mentions using "XGBoost models" and references "Chen et al., 2015", but does not specify the version number of the XGBoost library or other relevant software dependencies like Python or specific deep learning frameworks used for the experiments. |
| Experiment Setup | Yes | To enhance replicability, we use the following hyper-parameters for the gradient boosting model Chen et al. (2015) in all experiment settings: param = { max_depth : 10, eta : 0.001, objective : binary:logistic } |