reproducibilityindex.ai

Toward Efficient Navigation of Massive-Scale Geo-Textual Streams

Authors: Chengcheng Yang, Lisi Chen, Shuo Shang, Fan Zhu, Li Liu, Ling Shao

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two real-world datasets show that NQ-tree outperforms two well designed baselines by up to 10.
Researcher Affiliation	Industry	Chengcheng Yang , Lisi Chen , Shuo Shang , Fan Zhu , Li Liu and Ling Shao Inception Institute of Artiﬁcial Intelligence {chengcheng.yang, lisi.chen, shuo.shang, fan.zhu, li.liu, ling.shao}@inceptioniai.org
Pseudocode	Yes	Algorithm 1 Batch Insert Node; Algorithm 2 Search Log Store; Algorithm 3 Search Data Store
Open Source Code	No	The paper does not provide an explicit statement or a link to the open-source code for the NQ-tree methodology.
Open Datasets	No	The paper mentions using "two real-world datasets: 4SQ and TWEETS" but does not provide specific access information, links, or citations for public availability of these datasets.
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits with percentages or counts for model evaluation. It describes how data was used for insertions/deletions versus basic data, but not typical ML dataset splits.
Hardware Specification	Yes	The experiments were ran on a workstation powered by Intel Xeon Gold-6148 CPU on Linux (Ubuntu 16.04), having a 15K RPM disk.
Software Dependencies	No	The paper mentions "Linux (Ubuntu 16.04)" as the operating system but does not provide specific version numbers for other ancillary software dependencies like libraries or development environments.
Experiment Setup	Yes	We set the page size to 4 KB, and set the buffer size to 64MB and 256MB for 4SQ and TWEETS. An LRU buffer manager was implemented. Speciﬁcally, 4MB memory was allocated for the write buffer so that the geo-space was initially divided into 1024 grid cells. The inv Cache cached the storing information of 40% least frequent keywords, accounting for no more than 5% of the total data. When generating signatures, we skipped the frequent keywords that have more than 50% probability of residing in a log page.