Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?
Authors: Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on three widely used public LTR datasets. Our neural models are trained with listwise ranking losses. On all datasets, our framework can outperform recent neural LTR methods by a large margin. When comparing with the strong Lambda MART implementation, λMARTGBM, we are able to achieve equally good results, if not better. We compare a comprehensive list of methods in Table 2. Ablation study. We provide some ablation study results in Table 4 to highlight the effectiveness of each component in our framework. |
| Researcher Affiliation | Industry | Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork Google Research {zhenqin,lyyanle,hlz,yitay,ramakumar,xuanhui,bemike,najork}@google.com |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly labeled or formatted as such. |
| Open Source Code | No | We are in the process to release the code and trained models in an open-sourced software package. |
| Open Datasets | Yes | The three data sets we used in our experiments are public benchmark datasets widely adopted by the research community. They are the LETOR dataset from Microsoft (Qin & Liu, 2013), Set1 from the YAHOO LTR challenge (Chapelle & Chang, 2011), and Istella (Dato et al., 2016). |
| Dataset Splits | Yes | Table 1: The statistics of the three largest public benchmark datasets for LTR models. #queries #docs training validation test training validation test Web30K 18,919 6,306 6,306 2,270,296 747,218 753,611 Yahoo 19,944 2,994 6,983 473,134 71,083 165,660 Istella 20,901 2,318 9,799 6,587,822 737,803 3,129,004 |
| Hardware Specification | No | No specific hardware details are mentioned in the paper. |
| Software Dependencies | No | For all our experiments using neural network approaches, we implemented them using the TF-Ranking (Pasumarthi et al., 2019) library. (No version specified for TF-Ranking or any other library). |
| Experiment Setup | Yes | For λMARTGBM, we do a grid search for number of trees {300, 500, 1000}, number of leaves {200, 500, 1000}, and learning rate {0.01, 0.05, 0.1, 0.5}. For our neural models the main hyperparameters are hidden layer size {256, 512, 1024, 2048, 3072, 4096} and number of layers {3, 4, 5, 6} for regular DNN, data augmentation noise [0, 5.0] using binary search with step 0.1, number of attention layers {3, 4, 5, 6}, and number of attention heads {2, 3, 4, 5}. We apply a simple log1p transformation to every element of x and empirically find it works well for the Web30K and Istella datasets. We report all results based on the softmax cross entropy loss l(y, s(x)) = Pn i=1 yi loge esi Pj esj since it is simple and empirically robust in general, as demonstrated in Appendix B.2. |