reproducibilityindex.ai

P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts

Authors: Benjamin Newman, Prafulla Kumar Choubey, Nazneen Rajani

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	P-Adapters show between 12-26% absolute improvement in precision and 36-50% absolute improvement in consistency over a baseline of only using natural language queries. Additionally, we investigate Mixture of Experts (Mo E) models that learn a set of continuous prompts ( experts ) and select one to query the LLM.
Researcher Affiliation	Collaboration	Benjamin Newman Stanford University Prafulla Kumar Choubey Salesforce Research Nazneen Rajani Salesforce Research blnewman@cs.stanford.edu. Work conducted during internship at Salesforce Research.
Pseudocode	No	The paper describes model architectures and procedures but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	To encourage the use of P-Adapters to effectively extract factual information, we release the code used to train them. 1https://github.com/salesforce/factlm
Open Datasets	Yes	We use the entity pairs and relations from the T-Rex split of the LAMA work (Elsahar et al., 2018; Petroni et al., 2019) in our experiments. This data is used for evaluation. For training and validation, we use separate sets of entity pairs for each relation collected by Shin et al. (2020), which they use to optimize their discrete prompts. The templates we use are pooled from prior work: LAMA, LPAQA, and Para Rel datasets (Jiang et al., 2020; Elazar et al., 2021).
Dataset Splits	Yes	For training and validation, we use separate sets of entity pairs for each relation collected by Shin et al. (2020), which they use to optimize their discrete prompts. We split the templates into two equal-sized groups: one for training and one for OOD Prompt evaluation.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer,' 'Adam W optimizer,' 'Hugging Face Transformers,' and 'nlpaug package,' but it does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	All of our P-Adapters were trained using the hyperparameters from Liu et al. (2021b): Adam optimizer with a learning rate of 1e 5, weight decay of 5e 4, a batch size of 128, and an exponential learning rate decay schedule with a decay rate of 0.98 (Kingma & Ba, 2015). Our Mo E classiﬁers were trained using an Adam W optimizer with a learning rate of 0.001 and linear learning rate decay (Loshchilov & Hutter, 2018). We use Hugging Face Transformers to train the model for 3 epochs on the same training data used to train the P-Adapter models (Wolf et al., 2020).