Doubly Approximate Nearest Neighbor Classification

Authors: Weiwei Liu, Zhuanghua Liu, Ivor Tsang, Wenjie Zhang, Xuemin Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical studies show that our algorithm consistently obtains competitive or better classification results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions. Experiments on a wide spectrum of data sets show that our proposed method excels in all data sets, yet we obtain three orders of magnitude faster than state-of-the-art libraries on very high dimensions.
Researcher Affiliation Academia School of Computer Science and Engineering, The University of New South Wales Center for Artificial Intelligence, University of Technology Sydney {liuweiwei863, liuzhuanghua1991}@gmail.com, ivor.tsang@uts.edu.au, wenjie.zhang@unsw.edu.au, lxue@cse.unsw.edu.au
Pseudocode Yes Algorithm 1 CD-Tree [...] Algorithm 2 Separator Generation [...] Algorithm 3 Feature Generation
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes The URL data set is prepared by (Ma et al. 2009) and other data sets are collected from the LIBSVM website1(Du et al. 2017; Wang et al. 2015).
Dataset Splits Yes The training/testing partition is either predefined or the data is randomly split into 80% training and 20% testing (Wu et al. 2014).
Hardware Specification Yes The experiments are performed on a server with a 3.4GHZ Intel CPU and 94.5GB main memory running on a Linux platform.
Software Dependencies No The paper does not specify software versions for any libraries, frameworks, or programming languages used.
Experiment Setup Yes Following the parameter settings in (Li et al. 2013), β is set as 0.3n. The parameter B is selected using 5-fold cross validation over the range {20, 50, 100, 200, 400} for small and medium-sized data sets, and we set B = 400 for very high dimensional data sets. C is fixed to 5. We set r (the maximum number of points to examine for K-Means Tree and KD-Tree) as {5, 10, 30} and we achieve similar prediction performance for different r. We therefore simply fix r = 5 for fast testing time. Following similar parameter settings to those in (Zhang et al. 2015) for KBE and KPQ, b is selected over the range {256, 512, 1024} for the first category data sets and we fix b = 256 for large-scale data sets with fast testing time, ς = 256 and ϑ = 32.