reproducibilityindex.ai

PEER: A Collaborative Language Model

Authors: Timo Schick, Jane A. Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard, Qingfei You, Christoforos Nalmpantis, Edouard Grave, Sebastian Riedel

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments to investigate whether despite Wikipedia being our only natural source of comments and edits our infilling techniques enable us to turn PEER into a general purpose editing model capable of following human-written plans and tackling a range of editing tasks in different domains. Table 1: SARI scores on all subsets of Natural Edits. Domain-adapted (DA) variants outperform regular PEER, demonstrating the usefulness of synthetic edits generated with PEER-Undo.
Researcher Affiliation	Collaboration	Timo Schick1 Jane Dwivedi-Yu1 Zhengbao Jiang1,2 Fabio Petroni1 Patrick Lewis1 Gautier Izacard1,3 Qingfei You1 Christoforos Nalmpantis1 Edouard Grave1 Sebastian Riedel1,4 1 Meta AI Research 2 Carnegie Mellon University 3 Inria & ENS, PSL University 4 University College London
Pseudocode	No	The paper describes the functionality of various PEER instances (PEER-Edit, PEER-Undo, PEER-Explain, PEER-Document) but does not provide their implementation details in the form of pseudocode or a labeled algorithm block.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	Our main training data is based on Wikipedia s edit history. JFLEG (Napoles et al., 2017) is a grammatical error correction dataset... ASSET (Alva-Manchego et al., 2020) is a corpus for single-sentence text simplification; ITERATER (Du et al., 2022b) is an editing dataset spanning five edit intentions across three different domains; WNC (Pryzant et al., 2020) is a dataset where the task is to remove or mitigate biased words to make sentences more neutral; FRUIT (Logan IV et al., 2021) contains texts from Wikipedia that need to be updated; WAFER-INS (Dwivedi-Yu et al., 2022) is based on the WAFER dataset (Petroni et al., 2022)
Dataset Splits	Yes	We split each dataset into training and test data. we thus split our dataset of Wikipedia intros into 100 dev examples and 400 test examples
Hardware Specification	No	The paper mentions training on '64 GPUs' but does not specify the model or type of GPUs used, nor any other specific hardware components like CPU or memory.
Software Dependencies	No	The paper mentions using 'Deep Speed' and initializing from a 'pretrained language model', but it does not specify any software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	We use a maximum learning rate of 10^-4, warmup for 2,000 steps and linear decay. We further use gradient clipping with a maximum norm of 1.0, weight decay of 0.01 and a dropout rate of 0.1. The maximum sequence length is set to 1,024 and 384 tokens for input and output, respectively.