Toward Efficient Navigation of Massive-Scale Geo-Textual Streams
Authors: Chengcheng Yang, Lisi Chen, Shuo Shang, Fan Zhu, Li Liu, Ling Shao
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two real-world datasets show that NQ-tree outperforms two well designed baselines by up to 10. |
| Researcher Affiliation | Industry | Chengcheng Yang , Lisi Chen , Shuo Shang , Fan Zhu , Li Liu and Ling Shao Inception Institute of Artificial Intelligence {chengcheng.yang, lisi.chen, shuo.shang, fan.zhu, li.liu, ling.shao}@inceptioniai.org |
| Pseudocode | Yes | Algorithm 1 Batch Insert Node; Algorithm 2 Search Log Store; Algorithm 3 Search Data Store |
| Open Source Code | No | The paper does not provide an explicit statement or a link to the open-source code for the NQ-tree methodology. |
| Open Datasets | No | The paper mentions using "two real-world datasets: 4SQ and TWEETS" but does not provide specific access information, links, or citations for public availability of these datasets. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with percentages or counts for model evaluation. It describes how data was used for insertions/deletions versus basic data, but not typical ML dataset splits. |
| Hardware Specification | Yes | The experiments were ran on a workstation powered by Intel Xeon Gold-6148 CPU on Linux (Ubuntu 16.04), having a 15K RPM disk. |
| Software Dependencies | No | The paper mentions "Linux (Ubuntu 16.04)" as the operating system but does not provide specific version numbers for other ancillary software dependencies like libraries or development environments. |
| Experiment Setup | Yes | We set the page size to 4 KB, and set the buffer size to 64MB and 256MB for 4SQ and TWEETS. An LRU buffer manager was implemented. Specifically, 4MB memory was allocated for the write buffer so that the geo-space was initially divided into 1024 grid cells. The inv Cache cached the storing information of 40% least frequent keywords, accounting for no more than 5% of the total data. When generating signatures, we skipped the frequent keywords that have more than 50% probability of residing in a log page. |