Fine-Grained Distillation for Long Document Retrieval
Authors: Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Jianbing Shen, Guodong Long, Can Xu, Daxin Jiang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance. and In the experiments, we conduct an extensive evaluation of our proposed framework on two document retrieval benchmark datasets, i.e., MS-Marco document retrieval (Nguyen et al. 2016) and TREC 2019 Deep Learning track (Craswell et al. 2020). The experimental results show that our method achieves state-of-the-art performance compared with other strong competitors. |
| Researcher Affiliation | Collaboration | 1SKL-IOTSC, CIS, University of Macau 2AAII, FEIT, University of Technology Sydney 3Microsoft Corporation |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | In experiments, we conduct extensive evaluations of our method on the two long-document retrieval benchmark datasets: MS-Marco Doc (Nguyen et al. 2016) and TREC Deep Learning 2019 document retrieval (TREC 2019) (Craswell et al. 2020). |
| Dataset Splits | No | The paper mentions 'MS-Marco Doc Dev' for evaluation but does not provide specific training/validation/test split percentages, sample counts, or explicit details on how the data was partitioned for reproduction within its text. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions various pre-trained language models (e.g., BERT, RoBERTa, DeBERTa) and frameworks (Transformer), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other library versions) required to replicate the experiments. |
| Experiment Setup | Yes | where d {d+} N and τ denotes the temperature set to 1. and So, the final training loss for the bi-encoder learning with distillation is written as λL(cl) + L(kd). |