A Graph-based Relevance Matching Model for Ad-hoc Retrieval
Authors: Yufeng Zhang, Jinghao Zhang, Zeyu Cui, Shu Wu, Liang Wang4688-4696
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate GRMM on two representative ad-hoc retrieval benchmarks, where empirical results show the effectiveness and rationality of GRMM. We also compare our model with BERT-based method, where we find that BERT potentially suffers from the same problem when the document becomes long. We conduct comprehensive experiments to examine the effectiveness of GRMM and understand its working principle. |
| Researcher Affiliation | Academia | 1Center for Research on Intelligent Perception and Computing Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Artificial Intelligence Research, Chinese Academy of Sciences {yufeng.zhang,jinghao.zhang}@cripac.ia.ac.cn {zeyu.cui,shu.wu,wangliang}@nlpr.ia.ac.cn |
| Pseudocode | No | The paper includes a workflow diagram (Figure 2) and describes the method textually but does not provide pseudocode or an algorithm block. |
| Open Source Code | Yes | Our code is at https://github.com/CRIPAC-DIG/GRMM |
| Open Datasets | Yes | Robust041 is a standard ad-hoc retrieval dataset with 0.47M documents and 250 queries, using TREC disks 4 and 5 as document collections. Clue Web09-B2 is the Category B subset of the full web collection Clue Web09. It has 50M web pages and 200 queries, whose topics are accumulated from TREC Web Tracks 2009-2012. 1https://trec.nist.gov/data/cd45/index.html 2https://lemurproject.org/clueweb09/ |
| Dataset Splits | Yes | Both datasets were divided into five folds. We used them to conduct 5-fold cross-validation, where four of them are for tuning parameters, and one for testing (Mac Avaney et al. 2019). The process repeated five times with different random seeds each turn, and we took an average as the performance. The optimal hyper-parameters were determined via grid search on the validation set: the number of graph layers t was searched in {1, 2, 3, 4}, the k value of k-max-pooling was tuned in {10, 20, 30, 40, 50, 60, 70}, the sliding window size in {3,5,7,9}, the learning rate in {0.0001, 0.0005, 0.001, 0.005, 0.01}, and the batch size in {8, 16, 32, 48, 64}. |
| Hardware Specification | Yes | All experiments were conducted on a Linux server equipped with 8 NVIDIA Titan X GPUs. |
| Software Dependencies | No | The paper states 'We implemented our method in Py Torch4' but does not specify the version number for PyTorch or any other software dependencies with their versions. |
| Experiment Setup | Yes | The optimal hyper-parameters were determined via grid search on the validation set: the number of graph layers t was searched in {1, 2, 3, 4}, the k value of k-max-pooling was tuned in {10, 20, 30, 40, 50, 60, 70}, the sliding window size in {3,5,7,9}, the learning rate in {0.0001, 0.0005, 0.001, 0.005, 0.01}, and the batch size in {8, 16, 32, 48, 64}. Unless otherwise specified, we set t = 2 and k = 40 to report the performance (see Section and for different settings), and the model was trained with a window size of 5, a learning rate of 0.001 by Adam optimiser for 300 epochs, each with 32 batches times 16 triplets. |