Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
App2Exa: Accelerating Exact kNN Search via Dynamic Cache-Guided Approximation
Authors: Ke Li, Leong Hou U, Shuo Shang
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that App2Exa significantly boosts efficiency, providing a robust and scalable solution for evolving query patterns and enabling exact k NN search to support higher dimensionality more effectively. We conducted experiments on two real-world datasets to evaluate the performance of our proposal: SIFT and AOL. |
| Researcher Affiliation | Academia | 1University of Electronic Science and Technology of China, Chengdu, China 2University of Macau, Macau, China like EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Innovatively, we introduce Algorithms 1 and 2 to enable dynamic insertion and deletion of vectors, respectively. Algorithm 3 outlines the pseudo-code for the benefit-driven cache replacement algorithm. Algorithm 4 provides the pseudo code for k NN search using the VP-Tree with a best-first strategy (BF-k NN). The overall cache-guided search algorithm (CGS) is presented in Algorithm 5. |
| Open Source Code | Yes | 1The implementation is available at https://github.com/ likemelike/App2Exa Cache.git. |
| Open Datasets | Yes | We conducted experiments on two real-world datasets to evaluate the performance of our proposal: SIFT 2 and AOL. 2http://corpus-texmex.irisa.fr The AOL dataset consists of approximately 20 million web queries collected from 650,000 users over three months [Pass et al., 2006]. |
| Dataset Splits | Yes | By default, we randomly select 30,000 vectors as query vectors Q and 300,000 vectors as object vectors V. |
| Hardware Specification | Yes | All the methods are implemented in Java and evaluated on Ubuntu equipped with 2 Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz and 128GB memory, utilizing a single thread for execution. |
| Software Dependencies | No | All the methods are implemented in Java and evaluated on Ubuntu equipped with 2 Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz and 128GB memory, utilizing a single thread for execution. While the programming language (Java) is mentioned, no specific version number for Java or any other software libraries/dependencies are provided. |
| Experiment Setup | Yes | Parameter settings. The evaluation parameter settings are listed in Table 3. For the adjustable parameters α, β and γ in Equation 1, the default values are set to be equal (e.g., 1/3). |