Fine-Grained Distillation for Long Document Retrieval

Authors: Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Jianbing Shen, Guodong Long, Can Xu, Daxin Jiang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance. and In the experiments, we conduct an extensive evaluation of our proposed framework on two document retrieval benchmark datasets, i.e., MS-Marco document retrieval (Nguyen et al. 2016) and TREC 2019 Deep Learning track (Craswell et al. 2020). The experimental results show that our method achieves state-of-the-art performance compared with other strong competitors.
Researcher Affiliation Collaboration 1SKL-IOTSC, CIS, University of Macau 2AAII, FEIT, University of Technology Sydney 3Microsoft Corporation
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the described methodology or a link to a code repository.
Open Datasets Yes In experiments, we conduct extensive evaluations of our method on the two long-document retrieval benchmark datasets: MS-Marco Doc (Nguyen et al. 2016) and TREC Deep Learning 2019 document retrieval (TREC 2019) (Craswell et al. 2020).
Dataset Splits No The paper mentions 'MS-Marco Doc Dev' for evaluation but does not provide specific training/validation/test split percentages, sample counts, or explicit details on how the data was partitioned for reproduction within its text.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions various pre-trained language models (e.g., BERT, RoBERTa, DeBERTa) and frameworks (Transformer), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other library versions) required to replicate the experiments.
Experiment Setup Yes where d {d+} N and τ denotes the temperature set to 1. and So, the final training loss for the bi-encoder learning with distillation is written as λL(cl) + L(kd).