Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off

Authors: Yu-An Liu, Ruqing Zhang, Mingkun Zhang, Wei Chen, Maarten de Rijke, Jiafeng Guo, Xueqi Cheng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several ranking models demonstrate the superiority of PITA compared to existing adversarial defenses. We conduct experiments on the MS MARCO Passage Ranking dataset, which is a large-scale benchmark dataset for Web passage retrieval, with about 8.84 million passages (Nguyen et al. 2016).
Researcher Affiliation Academia 1CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3University of Amsterdam, Amsterdam, The Netherlands
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The proof of Theorem 1 and its tightness are provided at https://github.com/ict-bigdatalab/PIAT.
Open Datasets Yes We conduct experiments on the MS MARCO Passage Ranking dataset, which is a large-scale benchmark dataset for Web passage retrieval, with about 8.84 million passages (Nguyen et al. 2016).
Dataset Splits No We randomly sample 1000 Dev queries as target queries to attack their ranked lists for evaluation. For adversarial training, considering the time overhead, we sample 0.1 million (1/10 of the total) training queries to generate adversarial examples. The paper describes how data is sampled for training and evaluation within the existing dataset, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or exact counts for the entire dataset).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions software like 'Anserini toolkit' and frameworks like 'BERT' (implicitly used) and 'PyTorch' (implicitly used as typical for neural models) but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We train the NRMs with a batch size of 100, maximum sequence length of 256, and learning rate of 1e-5. By training the ranking model with different adversarial ranking losses, i.e., LKL adv, LList Net adv , and LList MLE adv , we obtain three types of PIAT as PIATKL, PIATList Net, and PIATList MLE, respectively. The regularization hyperparameter λ is set to 0.5. We set the maximum number of word substitutions to 20, and other hyperparameters are consistent with Wu et al. (2023).