reproducibilityindex.ai

Making Fisher Discriminant Analysis Scalable

Authors: Bojun Tu, Zhihua Zhang, Shusen Wang, Hui Qian

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our algorithms outperform PCA+LDA and have a similar scalability with it.
Researcher Affiliation	Academia	Bojun Tu TUBOJUN@GMAIL.COM Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai, China Zhihua Zhang ZHIHUA@SJTU.EDU.CN Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai, China Shusen Wang WSS@ZJU.EDU.CN College of Computer Science & Technology, Zhejiang University, Hangzhou, China Hui Qian QIANHUI@ZJU.EDU.CN College of Computer Science & Technology, Zhejiang University, Hangzhou, China
Pseudocode	Yes	Algorithm 1 The SVD based LDA algorithm", "Algorithm 2 SVD-QR-LDA algorithm", "Algorithm 3 randomized SVD algorithm", "Algorithm 4 RSVD-QR-KLDA algorithm
Open Source Code	No	The paper does not provide any specific links or explicit statements about the release of open-source code for the methodology described.
Open Datasets	Yes	In this section we perform empirical analysis of our proposed algorithms on two face datasets, Yale B&E and CMU PIE, two middle-sized document datasets, News20 (Lang, 1995) and RCV1 (Lewis et al., 2004), and a large dataset, Amazon7 (Dredze et al., 2008; Blondel et al., 2013).
Dataset Splits	Yes	For these two datasets, we randomly pick 70% of data for training and the remaining for test. We repeat this procedure 5 times and report the averages of the objective function, classiﬁcation accuracy and running time on these 5 repeats. [...] For News20, we use the ﬁrst 80% of the original data for training and the left 20% for test. [...] Amazon7 dataset contains 1,362,109 reviews of Amazon product. We randomly pick 80% of the dataset for training, and the remaining is for test. [...] Particularly, k is selected via 10-fold cross-validation.
Hardware Specification	Yes	All the algorithms are implemented in python 2.7 on a PC with an Intel Xeon X5675 3.07 GHz CPU and 12GB memory." and "We use an EMR cluster consisting of 8 m1.xlarge instance. Each m1.xlarge instance uses a Intel Xeon Family quad-core CPU and 15GB memory.
Software Dependencies	Yes	All the algorithms are implemented in python 2.7 on a PC with an Intel Xeon X5675 3.07 GHz CPU and 12GB memory. [...] We test our implementation on the distributed system on a Hadoop (Borthakur, 2007) cluster on Amazon Elastic Map Reduce (EMR). [...] The version of Hadoop we use is 1.0.3.
Experiment Setup	Yes	The SVD step in both RSVD-QR-LDA and PCA-LDA is computed by Algorithm 3 with q SVD = 1 and p SVD = 0.1k SVD." and "We use RBF kernel κ(x1, x2) = exp( x1 x2 2 θ ) in our experiment, where θ was set to the mean Euclidean distance among training data points.