Copy is All You Need
Authors: Tian Lan, Deng Cai, Yan Wang, Heyan Huang, Xian-Ling Mao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to verify the effectiveness of our proposed COG. On the standard language modeling benchmark (Wiki Text-103), our proposed COG substantially outperforms standard baselines on automatic metrics (26.14 vs. 23.43 MAUVE (Pillutla et al., 2021)) and human evaluation (48% vs. 28% human preference). (from Introduction) and 4 EXPERIMENTAL SETUP |
| Researcher Affiliation | Collaboration | Tencent AI Lab School of Computer Science and Technology, Beijing Institute of Technology |
| Pseudocode | Yes | Algorithm 1: Phrase Segmentation Algorithm |
| Open Source Code | Yes | Our source codes are publicly available at https://github.com/gmftby GMFTBY/Copyisallyouneed. |
| Open Datasets | Yes | On the standard language modeling benchmark (Wiki Text-103) ... The Wiki Text-103 dataset (Merity et al., 2017) contains an extensive collection of Wikipedia articles with over 100 million words... we use the English part of Law-MT (Koehn & Knowles, 2017)... The En-Wiki corpus contains a large-scale collection of Wikipedia articles with over 3 billion words |
| Dataset Splits | Yes | Benchmarks Train Dev Test Wiki Text-103 1,801,350 3,760 4,358 Law-MT 389,292 2,000 2,000 |
| Hardware Specification | Yes | We train baselines and COG for 400,000 steps on 8 Tesla-V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Huggingface transformers package' and specific models like 'GPT2 model' and 'BERT-base-cased model', but does not provide specific version numbers for these software dependencies or other libraries. |
| Experiment Setup | Yes | For all the baselines, the learning rate, dropout rate, and gradient clipping are set as 5e-5, 0.1, and 1.0, respectively. Due to memory limitation, the batch size is set to contain 256 phrases. For the BERT model in the phrase encoder, the maximum sequence length is set as 256. For the GPT2 model in the prefix encoder, the maximum sequence length is set as 512. |