reproducibilityindex.ai

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

Authors: Yufeng Zhang, Jinghao Zhang, Zeyu Cui, Shu Wu, Liang Wang4688-4696

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate GRMM on two representative ad-hoc retrieval benchmarks, where empirical results show the effectiveness and rationality of GRMM. We also compare our model with BERT-based method, where we ﬁnd that BERT potentially suffers from the same problem when the document becomes long. We conduct comprehensive experiments to examine the effectiveness of GRMM and understand its working principle.
Researcher Affiliation	Academia	1Center for Research on Intelligent Perception and Computing Institute of Automation, Chinese Academy of Sciences 2School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences 3Artiﬁcial Intelligence Research, Chinese Academy of Sciences {yufeng.zhang,jinghao.zhang}@cripac.ia.ac.cn {zeyu.cui,shu.wu,wangliang}@nlpr.ia.ac.cn
Pseudocode	No	The paper includes a workflow diagram (Figure 2) and describes the method textually but does not provide pseudocode or an algorithm block.
Open Source Code	Yes	Our code is at https://github.com/CRIPAC-DIG/GRMM
Open Datasets	Yes	Robust041 is a standard ad-hoc retrieval dataset with 0.47M documents and 250 queries, using TREC disks 4 and 5 as document collections. Clue Web09-B2 is the Category B subset of the full web collection Clue Web09. It has 50M web pages and 200 queries, whose topics are accumulated from TREC Web Tracks 2009-2012. 1https://trec.nist.gov/data/cd45/index.html 2https://lemurproject.org/clueweb09/
Dataset Splits	Yes	Both datasets were divided into ﬁve folds. We used them to conduct 5-fold cross-validation, where four of them are for tuning parameters, and one for testing (Mac Avaney et al. 2019). The process repeated ﬁve times with different random seeds each turn, and we took an average as the performance. The optimal hyper-parameters were determined via grid search on the validation set: the number of graph layers t was searched in {1, 2, 3, 4}, the k value of k-max-pooling was tuned in {10, 20, 30, 40, 50, 60, 70}, the sliding window size in {3,5,7,9}, the learning rate in {0.0001, 0.0005, 0.001, 0.005, 0.01}, and the batch size in {8, 16, 32, 48, 64}.
Hardware Specification	Yes	All experiments were conducted on a Linux server equipped with 8 NVIDIA Titan X GPUs.
Software Dependencies	No	The paper states 'We implemented our method in Py Torch4' but does not specify the version number for PyTorch or any other software dependencies with their versions.
Experiment Setup	Yes	The optimal hyper-parameters were determined via grid search on the validation set: the number of graph layers t was searched in {1, 2, 3, 4}, the k value of k-max-pooling was tuned in {10, 20, 30, 40, 50, 60, 70}, the sliding window size in {3,5,7,9}, the learning rate in {0.0001, 0.0005, 0.001, 0.005, 0.01}, and the batch size in {8, 16, 32, 48, 64}. Unless otherwise speciﬁed, we set t = 2 and k = 40 to report the performance (see Section and for different settings), and the model was trained with a window size of 5, a learning rate of 0.001 by Adam optimiser for 300 epochs, each with 32 batches times 16 triplets.