Distilling Knowledge from Reader to Retriever for Question Answering
Authors: Gautier Izacard, Edouard Grave
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on question answering, obtaining state-of-the-art results. Our method is inspired by knowledge distillation (Hinton et al., 2015), and uses the reader model to obtain synthetic labels to train the retriever model. In this section we evaluate the student-teacher training procedure from the previous section. We show that we obtain competitive performance without strong supervision for support documents. We perform experiments on Trivia QA (Joshi et al., 2017) and Natural Questions (Kwiatkowski et al., 2019), two standard benchmarks for open-domain question answering. |
| Researcher Affiliation | Collaboration | Gautier Izacard1,2,3, Edouard Grave1 1Facebook AI Research, 2 Ecole normale sup erieure, PSL University, 3Inria {gizacard|egrave}@fb.com |
| Pseudocode | No | The paper describes the iterative training procedure in four steps using numbered lists, but it does not present them as structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Our code is available at: github.com/facebookresearch/Fi D. |
| Open Datasets | Yes | We perform experiments on Trivia QA (Joshi et al., 2017) and Natural Questions (Kwiatkowski et al., 2019), two standard benchmarks for open-domain question answering. We also evaluate on Narrative Questions (Koˇcisk y et al., 2018), using a publicly available preprocessed version.1 |
| Dataset Splits | Yes | Following the setting from Lee et al. (2019); Karpukhin et al. (2020), we use the original evaluation set as test set, and keep 10% of the training data for validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions several software components and models (BERT base model, T5 base model, Apache Lucene, SpaCy) but does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | The reader is trained for 10k gradient steps with a constant learning rate of 10 4, and the best model is selected based on the validation performance. The retriever is trained with a constant learning rate of 5 10 5 until the performance saturates. More details on the hyperparameters and the training procedure are reported in Appendix A.2. Appendix A.2 contains Table 6: Hyperparameters for retriever and reader training, which specifies Number of parameters, Number of heads, Number of layers, Hidden size, Batch size, Dropout, Learning rate schedule, Peak learning rate, and Gradient clipping. |