Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Decoupled Context Processing for Context Augmented Language Modeling
Authors: Zonglin Li, Ruiqi Guo, Sanjiv Kumar
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimented with the same encoder-decoder context incorporation mechanism for both auto-regressive language modeling and open domain question answering. |
| Researcher Affiliation | Industry | Zonglin Li Google Research, New York EMAIL Ruiqi Guo Google Research, New York EMAIL Sanjiv Kumar Google Research, New York EMAIL |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Figure 1) and describes procedures in text, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | In the ethics checklist, it states: 'We still need to clean up the code before it s ready.' |
| Open Datasets | Yes | For auto-regressive language modeling, we use English C4 [32] version 2.2.1, the same as Retro. We use the same question and context processed by [18], where the context is retrieved with DPR retriever [20]. |
| Dataset Splits | Yes | Database Split # Articles # Entries C4 Train 364.6M 3382M C4 Val, unfiltered 0.3645M 3.369M C4 Val, filtered 2.868M NQ-open Train 79k NQ-open Dev 8.8k NQ-open Test 3.6k Wiki Database 21M |
| Hardware Specification | Yes | For base and large we used 64 TPUv3 chips whereas 128 TPUv3 chips for training XL. Each model is trained on 64 TPUv3 chips. |
| Software Dependencies | No | The paper mentions various software components and libraries, such as 'm T5 [42]', 'NLTK [3]', 'sentencepiece [22]', 'T5X retrieval framework [28]', 'ScaNN [14]', and 'Adafactor optimizer [36]', but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We trained the Encoder-Decoder LM model for a total of 1, 100, 000 steps with a batch size of 512 and a default learning rate schedule of square-root decay. This corresponds to 10, 000 warmup steps with a fixed learning rate of 0.01, followed by square-root decay for 990, 000 steps. We jointly fine-tuned the encoder and decoder for 40, 000 steps with 20 context passages for each input of the train split. We used a batch size of 64, a fixed learning rate of 10 4 and Adafactor optimizer [36]. |