P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts
Authors: Benjamin Newman, Prafulla Kumar Choubey, Nazneen Rajani
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | P-Adapters show between 12-26% absolute improvement in precision and 36-50% absolute improvement in consistency over a baseline of only using natural language queries. Additionally, we investigate Mixture of Experts (Mo E) models that learn a set of continuous prompts ( experts ) and select one to query the LLM. |
| Researcher Affiliation | Collaboration | Benjamin Newman Stanford University Prafulla Kumar Choubey Salesforce Research Nazneen Rajani Salesforce Research blnewman@cs.stanford.edu. Work conducted during internship at Salesforce Research. |
| Pseudocode | No | The paper describes model architectures and procedures but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To encourage the use of P-Adapters to effectively extract factual information, we release the code used to train them. 1https://github.com/salesforce/factlm |
| Open Datasets | Yes | We use the entity pairs and relations from the T-Rex split of the LAMA work (Elsahar et al., 2018; Petroni et al., 2019) in our experiments. This data is used for evaluation. For training and validation, we use separate sets of entity pairs for each relation collected by Shin et al. (2020), which they use to optimize their discrete prompts. The templates we use are pooled from prior work: LAMA, LPAQA, and Para Rel datasets (Jiang et al., 2020; Elazar et al., 2021). |
| Dataset Splits | Yes | For training and validation, we use separate sets of entity pairs for each relation collected by Shin et al. (2020), which they use to optimize their discrete prompts. We split the templates into two equal-sized groups: one for training and one for OOD Prompt evaluation. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer,' 'Adam W optimizer,' 'Hugging Face Transformers,' and 'nlpaug package,' but it does not specify exact version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | All of our P-Adapters were trained using the hyperparameters from Liu et al. (2021b): Adam optimizer with a learning rate of 1e 5, weight decay of 5e 4, a batch size of 128, and an exponential learning rate decay schedule with a decay rate of 0.98 (Kingma & Ba, 2015). Our Mo E classifiers were trained using an Adam W optimizer with a learning rate of 0.001 and linear learning rate decay (Loshchilov & Hutter, 2018). We use Hugging Face Transformers to train the model for 3 epochs on the same training data used to train the P-Adapter models (Wolf et al., 2020). |