RLTM: An Efficient Neural IR Framework for Long Documents

Authors: Chen Zheng, Yu Sun, Shengxian Wan, Dianhai Yu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments based on two datasets, a human-labeled dataset and a click-through dataset, and compare our framework with state-of-the-art IR models. Experimental results show that the RLTM framework not only achieves higher accuracy but also accomplish lower computational cost compared to the baselines.
Researcher Affiliation Industry Chen Zheng , Yu Sun , Shengxian Wan , Dianhai Yu Baidu Inc., Beijing, China {zhengchen02, sunyu02, wanshengxian, yudianhai}@baidu.com
Pseudocode Yes Algorithm 1 Reinforced Long Text Matching (RLTM)
Open Source Code No The paper does not provide any statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets No We conduct our experiments on two large-scale datasets, both of them are from one Chinese search engine. The first dataset, named Human-Label dataset, is a human-annotated dataset. The second dataset, named Click-Through data, is sampled from the click-through search log. The paper does not provide public access information (links, repositories, or formal citations) for these datasets.
Dataset Splits Yes Queries for train 81922 Queries for validation 6228 Queries for test 7312
Hardware Specification No The paper does not specify the hardware used for running the experiments. It only mentions 'TensorFlow' as the implementation framework.
Software Dependencies No We implemented all the models using Tensor Flow. We used stochastic gradient descent method, Adam[Kingma and Ba, 2014], as our optimizer for the training. The paper mentions TensorFlow and Adam but does not provide specific version numbers for any software or libraries.
Experiment Setup Yes We set the batch size to 32 and selected the learning rate from [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]. For the reinforced sentence selection model, the fully connected hidden size is 128, and we choose the number of sentences as (1, 3, 5). For Match Pyramid, we set the query window size to 2, sentence and sentence window size to 4. And the kernel size is 128. For K-NRM, we set the number of bins to 11.