Making Fisher Discriminant Analysis Scalable
Authors: Bojun Tu, Zhihua Zhang, Shusen Wang, Hui Qian
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our algorithms outperform PCA+LDA and have a similar scalability with it. |
| Researcher Affiliation | Academia | Bojun Tu TUBOJUN@GMAIL.COM Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai, China Zhihua Zhang ZHIHUA@SJTU.EDU.CN Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai, China Shusen Wang WSS@ZJU.EDU.CN College of Computer Science & Technology, Zhejiang University, Hangzhou, China Hui Qian QIANHUI@ZJU.EDU.CN College of Computer Science & Technology, Zhejiang University, Hangzhou, China |
| Pseudocode | Yes | Algorithm 1 The SVD based LDA algorithm", "Algorithm 2 SVD-QR-LDA algorithm", "Algorithm 3 randomized SVD algorithm", "Algorithm 4 RSVD-QR-KLDA algorithm |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the release of open-source code for the methodology described. |
| Open Datasets | Yes | In this section we perform empirical analysis of our proposed algorithms on two face datasets, Yale B&E and CMU PIE, two middle-sized document datasets, News20 (Lang, 1995) and RCV1 (Lewis et al., 2004), and a large dataset, Amazon7 (Dredze et al., 2008; Blondel et al., 2013). |
| Dataset Splits | Yes | For these two datasets, we randomly pick 70% of data for training and the remaining for test. We repeat this procedure 5 times and report the averages of the objective function, classification accuracy and running time on these 5 repeats. [...] For News20, we use the first 80% of the original data for training and the left 20% for test. [...] Amazon7 dataset contains 1,362,109 reviews of Amazon product. We randomly pick 80% of the dataset for training, and the remaining is for test. [...] Particularly, k is selected via 10-fold cross-validation. |
| Hardware Specification | Yes | All the algorithms are implemented in python 2.7 on a PC with an Intel Xeon X5675 3.07 GHz CPU and 12GB memory." and "We use an EMR cluster consisting of 8 m1.xlarge instance. Each m1.xlarge instance uses a Intel Xeon Family quad-core CPU and 15GB memory. |
| Software Dependencies | Yes | All the algorithms are implemented in python 2.7 on a PC with an Intel Xeon X5675 3.07 GHz CPU and 12GB memory. [...] We test our implementation on the distributed system on a Hadoop (Borthakur, 2007) cluster on Amazon Elastic Map Reduce (EMR). [...] The version of Hadoop we use is 1.0.3. |
| Experiment Setup | Yes | The SVD step in both RSVD-QR-LDA and PCA-LDA is computed by Algorithm 3 with q SVD = 1 and p SVD = 0.1k SVD." and "We use RBF kernel κ(x1, x2) = exp( x1 x2 2 θ ) in our experiment, where θ was set to the mean Euclidean distance among training data points. |