ReFusion: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion
Authors: Shangyu Wu, Ying Xiong, Yufei CUI, Xue Liu, Buzhou Tang, Tei-Wei Kuo, Chun Jason Xue
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed Re Fusion can achieve superior and robust performance in various NKI tasks. Finally, we conducted comprehensive experiments on 15 different NKI tasks. |
| Researcher Affiliation | Academia | 1 City University of Hong Kong 2 Harbin Institute of Technology, Shenzhen 3 MILA, Mc Gill University 4 National Taiwan University 5 Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes steps and formulas in paragraph text and mathematical equations. |
| Open Source Code | Yes | Codes are available at 1. (Footnote points to https://github.com/luffy06/ReFusion) |
| Open Datasets | Yes | Datasets We conduct comprehensive experiments across 15 NKI tasks, including 8 tasks from GLUE benchmark(Wang et al., 2019), SNLI, SST-5, MR, CR, MNLI, MNLI-mm, Subj and TREC. All these datasets configurations are identical to that of LM-BFF (Gao et al., 2021). |
| Dataset Splits | Yes | All these datasets configurations are identical to that of LM-BFF (Gao et al., 2021). For the upper level, this paper aims to find the optimal combination of ranking schemes that maximizes performance on the validation set. |
| Hardware Specification | Yes | The proposed method was implemented using PyTorch framework, utilizing the computational power of two NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions "PyTorch framework" but does not specify its version number or any other software dependencies with their versions. |
| Experiment Setup | Yes | The hyperparameters are listed as follows: the learning rate is 1e-5, the batch size is 32, the maximum sequence length is 128, the maximum steps are 1000, the number k of similar sentences retrieved is set to 64 and we save the last checkpoint. We use AdamW as the optimizer. The models are based on RoBERTa-large for fair comparison with LM-BFF. |