Doubly Approximate Nearest Neighbor Classification
Authors: Weiwei Liu, Zhuanghua Liu, Ivor Tsang, Wenjie Zhang, Xuemin Lin
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical studies show that our algorithm consistently obtains competitive or better classification results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions. Experiments on a wide spectrum of data sets show that our proposed method excels in all data sets, yet we obtain three orders of magnitude faster than state-of-the-art libraries on very high dimensions. |
| Researcher Affiliation | Academia | School of Computer Science and Engineering, The University of New South Wales Center for Artificial Intelligence, University of Technology Sydney {liuweiwei863, liuzhuanghua1991}@gmail.com, ivor.tsang@uts.edu.au, wenjie.zhang@unsw.edu.au, lxue@cse.unsw.edu.au |
| Pseudocode | Yes | Algorithm 1 CD-Tree [...] Algorithm 2 Separator Generation [...] Algorithm 3 Feature Generation |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | The URL data set is prepared by (Ma et al. 2009) and other data sets are collected from the LIBSVM website1(Du et al. 2017; Wang et al. 2015). |
| Dataset Splits | Yes | The training/testing partition is either predefined or the data is randomly split into 80% training and 20% testing (Wu et al. 2014). |
| Hardware Specification | Yes | The experiments are performed on a server with a 3.4GHZ Intel CPU and 94.5GB main memory running on a Linux platform. |
| Software Dependencies | No | The paper does not specify software versions for any libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Following the parameter settings in (Li et al. 2013), β is set as 0.3n. The parameter B is selected using 5-fold cross validation over the range {20, 50, 100, 200, 400} for small and medium-sized data sets, and we set B = 400 for very high dimensional data sets. C is fixed to 5. We set r (the maximum number of points to examine for K-Means Tree and KD-Tree) as {5, 10, 30} and we achieve similar prediction performance for different r. We therefore simply fix r = 5 for fast testing time. Following similar parameter settings to those in (Zhang et al. 2015) for KBE and KPQ, b is selected over the range {256, 512, 1024} for the first category data sets and we fix b = 256 for large-scale data sets with fast testing time, ς = 256 and ϑ = 32. |