PEER: A Collaborative Language Model

Authors: Timo Schick, Jane A. Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard, Qingfei You, Christoforos Nalmpantis, Edouard Grave, Sebastian Riedel

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of experiments to investigate whether despite Wikipedia being our only natural source of comments and edits our infilling techniques enable us to turn PEER into a general purpose editing model capable of following human-written plans and tackling a range of editing tasks in different domains. Table 1: SARI scores on all subsets of Natural Edits. Domain-adapted (DA) variants outperform regular PEER, demonstrating the usefulness of synthetic edits generated with PEER-Undo.
Researcher Affiliation Collaboration Timo Schick1 Jane Dwivedi-Yu1 Zhengbao Jiang1,2 Fabio Petroni1 Patrick Lewis1 Gautier Izacard1,3 Qingfei You1 Christoforos Nalmpantis1 Edouard Grave1 Sebastian Riedel1,4 1 Meta AI Research 2 Carnegie Mellon University 3 Inria & ENS, PSL University 4 University College London
Pseudocode No The paper describes the functionality of various PEER instances (PEER-Edit, PEER-Undo, PEER-Explain, PEER-Document) but does not provide their implementation details in the form of pseudocode or a labeled algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes Our main training data is based on Wikipedia s edit history. JFLEG (Napoles et al., 2017) is a grammatical error correction dataset... ASSET (Alva-Manchego et al., 2020) is a corpus for single-sentence text simplification; ITERATER (Du et al., 2022b) is an editing dataset spanning five edit intentions across three different domains; WNC (Pryzant et al., 2020) is a dataset where the task is to remove or mitigate biased words to make sentences more neutral; FRUIT (Logan IV et al., 2021) contains texts from Wikipedia that need to be updated; WAFER-INS (Dwivedi-Yu et al., 2022) is based on the WAFER dataset (Petroni et al., 2022)
Dataset Splits Yes We split each dataset into training and test data. we thus split our dataset of Wikipedia intros into 100 dev examples and 400 test examples
Hardware Specification No The paper mentions training on '64 GPUs' but does not specify the model or type of GPUs used, nor any other specific hardware components like CPU or memory.
Software Dependencies No The paper mentions using 'Deep Speed' and initializing from a 'pretrained language model', but it does not specify any software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes We use a maximum learning rate of 10^-4, warmup for 2,000 steps and linear decay. We further use gradient clipping with a maximum norm of 1.0, weight decay of 0.01 and a dropout rate of 0.1. The maximum sequence length is set to 1,024 and 384 tokens for input and output, respectively.