Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Spectral Clustering via the Power Method - Provably
Authors: Christos Boutsidis, Prabhanjan Kambadur, Alex Gittens
ICML 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments To conduct our experiments, we developed high-quality MATLAB versions of the spectral clustering algorithms. In the remainder of this section, we refer to the clustering algorithm in (Shi & Malik, 2000a) as exact algorithm . We refer to the modified version that uses the power method as approximate algorithm . To measure clustering quality, we used normalized mutual information (Manning et al., 2008): |
| Researcher Affiliation | Collaboration | Christos Boutsidis EMAIL Yahoo, 229 West 43rd Street, New York, NY, USA. Alex Gittens EMAIL International Computer Science Institute, Berkeley, CA, USA. Prabhanjan Kambadur EMAIL Bloomberg L.P., 731 Lexington Avenue, New York, 10022, USA. |
| Pseudocode | No | The paper describes the spectral clustering algorithm and the power method using numbered steps within the text, but these are not presented as formally structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'To conduct our experiments, we developed high-quality MATLAB versions of the spectral clustering algorithms.' but does not provide concrete access to this code or state that it is open-source. |
| Open Datasets | Yes | We ran our experiments on four multi-class datasets from the lib SVMTools webpage (Table 1). ... The lib SVM multi-class classification datasets (Chang & Lin, 2011) used for our spectral clustering experiments. Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm. |
| Dataset Splits | No | The paper discusses the use of multi-class datasets for spectral clustering experiments but does not provide specific details on training, validation, or test dataset splits needed for reproducibility. |
| Hardware Specification | Yes | All our experiments were run using MATLAB 8.1.0.604 (R2013a) on a 1.4 GHz Intel Core i5 dual-core processor running OS X 10.9.5 with 8GB 1600 MHz DDR3 RAM. |
| Software Dependencies | Yes | All our experiments were run using MATLAB 8.1.0.604 (R2013a)... The exact algorithm uses MATLAB s svds function... The approximate algorithm exploits the tallthin structure of B and computes Y using MATLAB s svd function. The approximate algorithm uses MATLAB s normrnd function to generate the random Gaussian matrix S. We used MATLAB s kmeans function with the options Empty Action , singleton , Max Iter , 100, Replicates , 10. |
| Experiment Setup | Yes | To compute W, we use the heat kernel:Wij = e ( xi xj 2)/σij, where xi Rd and xj Rd are the data points and σij is a tuning parameter; σij is determined using the self-tuning method described in (Zelnik-Manor & Perona, 2004). That is, for each data point i, xi is computed to be the Euclidean distance of the ℓth furthest neighbor from i; then σij is set to be xixj for every (i, j); in our experiments, we report the results for ℓ= 7. ... We used MATLAB s kmeans function with the options Empty Action , singleton , Max Iter , 100, Replicates , 10. ... we varied p from 0 to 10. |