Learning List-Level Domain-Invariant Representations for Ranking
Authors: Ruicheng Xian, Honglei Zhuang, Zhen Qin, Hamed Zamani, Jing Lu, Ji Ma, Kai Hui, Han Zhao, Xuanhui Wang, Michael Bendersky
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the empirical benefits of invariant representation learning with list-level alignment (List DA) for unsupervised domain adaptation, we evaluate it on the passage reranking task |
| Researcher Affiliation | Collaboration | Ruicheng Xian1 Honglei Zhuang2 Zhen Qin2 Hamed Zamani3 Jing Lu2 Ji Ma2 Kai Hui2 Han Zhao1 Xuanhui Wang2 Michael Bendersky2 1University of Illinois Urbana-Champaign {rxian2,hanzhao}@illinois.edu 2Google Research {hlz,zhenqin,ljwinnie,maji,kaihuibj,xuanhui,bemike}@google.com 3University of Massachusetts Amherst zamani@cs.umass.edu |
| Pseudocode | No | The paper describes algorithmic procedures in paragraph text and mathematical equations but does not include any formally structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use the MS MARCO dataset for passage ranking [3] as the source domain... The target domains are biomedical (TREC-COVID [62], Bio ASQ [59]) and news articles (Robust04 [64]). |
| Dataset Splits | Yes | The data are preprocessed consistently with the BEIR benchmark [57]; their paper includes dataset statistics. ... the training set sizes are 19,944 and 1,266, and the test set sizes are 6,983 and 3,798 (of which there are 47 and 17 lists of length-1). |
| Hardware Specification | Yes | For the Rank T5 reranking model, it is fine-tuned from the T5 v1.1 base checkpoint on a Dragonfish TPU with 8x8 topology for 100,000 steps... The model is trained from scratch on an NVIDIA A6000 GPU for 10,000 steps |
| Software Dependencies | No | Our implementation uses PyTorch and the Hugging Face Transformers library [66]. No specific version numbers for PyTorch or Transformers are provided. |
| Experiment Setup | Yes | For the Rank T5 reranking model...for 100,000 steps with a batch size of 32 per domain (each training list contains 31 items). We tune the learning rate ηrank {5e-5, 1e-4, 2e-4}... We apply a learning rate schedule on ηrank that decays (exponentially) by a factor of 0.7 every 5,000 steps. The concatenated query-document text inputs are truncated to 512 tokens. For the domain discriminators, there are two hyperparameters: the strength of invariant feature learning λ, and the discriminator learning rate ηad. We tune both by sweeping λ {0.01, 0.02}, and ηad {10, 20, 40} ηrank as multiples of the reranker learning rate. |