reproducibilityindex.ai

Doubly Approximate Nearest Neighbor Classification

Authors: Weiwei Liu, Zhuanghua Liu, Ivor Tsang, Wenjie Zhang, Xuemin Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical studies show that our algorithm consistently obtains competitive or better classiﬁcation results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions. Experiments on a wide spectrum of data sets show that our proposed method excels in all data sets, yet we obtain three orders of magnitude faster than state-of-the-art libraries on very high dimensions.
Researcher Affiliation	Academia	School of Computer Science and Engineering, The University of New South Wales Center for Artiﬁcial Intelligence, University of Technology Sydney {liuweiwei863, liuzhuanghua1991}@gmail.com, ivor.tsang@uts.edu.au, wenjie.zhang@unsw.edu.au, lxue@cse.unsw.edu.au
Pseudocode	Yes	Algorithm 1 CD-Tree [...] Algorithm 2 Separator Generation [...] Algorithm 3 Feature Generation
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	The URL data set is prepared by (Ma et al. 2009) and other data sets are collected from the LIBSVM website1(Du et al. 2017; Wang et al. 2015).
Dataset Splits	Yes	The training/testing partition is either predeﬁned or the data is randomly split into 80% training and 20% testing (Wu et al. 2014).
Hardware Specification	Yes	The experiments are performed on a server with a 3.4GHZ Intel CPU and 94.5GB main memory running on a Linux platform.
Software Dependencies	No	The paper does not specify software versions for any libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Following the parameter settings in (Li et al. 2013), β is set as 0.3n. The parameter B is selected using 5-fold cross validation over the range {20, 50, 100, 200, 400} for small and medium-sized data sets, and we set B = 400 for very high dimensional data sets. C is ﬁxed to 5. We set r (the maximum number of points to examine for K-Means Tree and KD-Tree) as {5, 10, 30} and we achieve similar prediction performance for different r. We therefore simply ﬁx r = 5 for fast testing time. Following similar parameter settings to those in (Zhang et al. 2015) for KBE and KPQ, b is selected over the range {256, 512, 1024} for the ﬁrst category data sets and we ﬁx b = 256 for large-scale data sets with fast testing time, ς = 256 and ϑ = 32.