Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Making Fisher Discriminant Analysis Scalable
Authors: Bojun Tu, Zhihua Zhang, Shusen Wang, Hui Qian
ICML 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our algorithms outperform PCA+LDA and have a similar scalability with it. |
| Researcher Affiliation | Academia | Bojun Tu EMAIL Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai, China Zhihua Zhang EMAIL Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai, China Shusen Wang EMAIL College of Computer Science & Technology, Zhejiang University, Hangzhou, China Hui Qian EMAIL College of Computer Science & Technology, Zhejiang University, Hangzhou, China |
| Pseudocode | Yes | Algorithm 1 The SVD based LDA algorithm", "Algorithm 2 SVD-QR-LDA algorithm", "Algorithm 3 randomized SVD algorithm", "Algorithm 4 RSVD-QR-KLDA algorithm |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the release of open-source code for the methodology described. |
| Open Datasets | Yes | In this section we perform empirical analysis of our proposed algorithms on two face datasets, Yale B&E and CMU PIE, two middle-sized document datasets, News20 (Lang, 1995) and RCV1 (Lewis et al., 2004), and a large dataset, Amazon7 (Dredze et al., 2008; Blondel et al., 2013). |
| Dataset Splits | Yes | For these two datasets, we randomly pick 70% of data for training and the remaining for test. We repeat this procedure 5 times and report the averages of the objective function, classification accuracy and running time on these 5 repeats. [...] For News20, we use the first 80% of the original data for training and the left 20% for test. [...] Amazon7 dataset contains 1,362,109 reviews of Amazon product. We randomly pick 80% of the dataset for training, and the remaining is for test. [...] Particularly, k is selected via 10-fold cross-validation. |
| Hardware Specification | Yes | All the algorithms are implemented in python 2.7 on a PC with an Intel Xeon X5675 3.07 GHz CPU and 12GB memory." and "We use an EMR cluster consisting of 8 m1.xlarge instance. Each m1.xlarge instance uses a Intel Xeon Family quad-core CPU and 15GB memory. |
| Software Dependencies | Yes | All the algorithms are implemented in python 2.7 on a PC with an Intel Xeon X5675 3.07 GHz CPU and 12GB memory. [...] We test our implementation on the distributed system on a Hadoop (Borthakur, 2007) cluster on Amazon Elastic Map Reduce (EMR). [...] The version of Hadoop we use is 1.0.3. |
| Experiment Setup | Yes | The SVD step in both RSVD-QR-LDA and PCA-LDA is computed by Algorithm 3 with q SVD = 1 and p SVD = 0.1k SVD." and "We use RBF kernel κ(x1, x2) = exp( x1 x2 2 θ ) in our experiment, where θ was set to the mean Euclidean distance among training data points. |