Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory
Authors: Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, Rui Yan
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of Selfmem on three distinct text generation tasks: neural machine translation, abstractive text summarization, and dialogue generation, under two generation paradigms: fine-tuned small model and few-shot LLM. Our approach achieves state-of-the-art results in four directions in JRC-Acquis translation dataset, 50.3 ROUGE-1 in XSum, and 62.9 ROUGE-1 in Big Patent, demonstrating the potential of self-memory in enhancing retrieval-augmented generation models. |
| Researcher Affiliation | Collaboration | Xin Cheng1 Di Luo2 Xiuying Chen3 Lemao Liu4 Dongyan Zhao1 Rui Yan2 1 Peking University 2 Remin University of China 3 KAUST 4 Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1 Selfmem Framework |
| Open Source Code | Yes | Code and data available at: https://github.com/Hannibal046/Self Memory |
| Open Datasets | Yes | We assess the performance of Selfmem on three generation tasks, utilizing a total of seven datasets. Translation. We evaluate our framework on JRC-Acquis datasets [82], a collection of parallel legislative text of European Union Law... Summarization. We evaluate on 2 summarization datasets: 1) XSum [60]... 2) Big Patent [73]... Dialogue. We experiment on Daily Dialog [44]... |
| Dataset Splits | Yes | Table 7: Dataset statistics for three tasks. Task Dataset #Train #Dev #Test ... JRC (en de) 663,487 2,454 2,483 ... XSum 204,045 11,332 11,334 |
| Hardware Specification | Yes | All experiments are conducted on the same device, equipped with one NVIDIA A100 GPU and one AMD EPYC 7V13 64-Core Processor. |
| Software Dependencies | No | The paper mentions software components like SACREBLEU, Adafactor, Transformer, XLM-Rbase, BARTbase, BRIO, and RoBERTa, but does not specify their version numbers or the versions of underlying programming languages or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | The hyper-parameter setting follows [17] with dropout 0.1, label smoothing 0.1, gradient clipping 1.0, Adafactor [74], warm-up steps 4000, maximum learning rate 4.4e-2 and training epochs 30 for total. The maximum input length is 512 for XSum and 1024 for Big Patent. |