Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Authors: Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in the selection of in-context examples.
Researcher Affiliation Collaboration Pan Lu1,3, Liang Qiu1, Kai-Wei Chang1, Ying Nian Wu1, Song-Chun Zhu1, Tanmay Rajpurohit2, Peter Clark3, Ashwin Kalyan3 1University of California, Los Angeles, 2Georgia Institute of Technology, 3Allen Institute for AI
Pseudocode Yes The learning process is summarized in Algorithm 1 in the appendix.
Open Source Code Yes The data and code are available at https://promptpg.github.io.
Open Datasets Yes The data and code are available at https://promptpg.github.io.
Dataset Splits Yes The TABMWP dataset contains 38,431 tabular math word problems, which are partitioned with 6:2:2 into the training, development, and test splits, corresponding to 23,059, 7,686, and 7,686 problems.
Hardware Specification Yes Our experiments for Unified QA baselines, TAPEX baselines, and our proposed PROMPTPG are conducted using Py Torch on two Nvidia RTX 3090 GPUs.
Software Dependencies No The paper mentions 'Py Torch' and 'TEXT-DAVINCI-002' but does not specify version numbers for PyTorch or other key software dependencies required for replication.
Experiment Setup Yes For fine-tuning the Unified QA and TAPEX baselines, we use the Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 5e 5. The training process takes 10 epochs with a batch size of 16. The maximum number of input tokens is set as 200 and the maximum output length is 100. In our proposed PROMPTPG... we use the Adam optimizer with an initial learning rate of 1e 3. The maximum number of training epochs is 30, with a batch size of 20. For the GPT-3 engine... The temperature is set as 0 and the top probability is set as 1.0... The maximum number of tokens allowed for generating text is 512. Both the frequency penalty and the presence penalty are set as the default value, i.e., 0.