Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Spectral Clustering via the Power Method - Provably

Authors: Christos Boutsidis, Prabhanjan Kambadur, Alex Gittens

ICML 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments To conduct our experiments, we developed high-quality MATLAB versions of the spectral clustering algorithms. In the remainder of this section, we refer to the clustering algorithm in (Shi & Malik, 2000a) as exact algorithm . We refer to the modified version that uses the power method as approximate algorithm . To measure clustering quality, we used normalized mutual information (Manning et al., 2008):
Researcher Affiliation Collaboration Christos Boutsidis EMAIL Yahoo, 229 West 43rd Street, New York, NY, USA. Alex Gittens EMAIL International Computer Science Institute, Berkeley, CA, USA. Prabhanjan Kambadur EMAIL Bloomberg L.P., 731 Lexington Avenue, New York, 10022, USA.
Pseudocode No The paper describes the spectral clustering algorithm and the power method using numbered steps within the text, but these are not presented as formally structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'To conduct our experiments, we developed high-quality MATLAB versions of the spectral clustering algorithms.' but does not provide concrete access to this code or state that it is open-source.
Open Datasets Yes We ran our experiments on four multi-class datasets from the lib SVMTools webpage (Table 1). ... The lib SVM multi-class classification datasets (Chang & Lin, 2011) used for our spectral clustering experiments. Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm.
Dataset Splits No The paper discusses the use of multi-class datasets for spectral clustering experiments but does not provide specific details on training, validation, or test dataset splits needed for reproducibility.
Hardware Specification Yes All our experiments were run using MATLAB 8.1.0.604 (R2013a) on a 1.4 GHz Intel Core i5 dual-core processor running OS X 10.9.5 with 8GB 1600 MHz DDR3 RAM.
Software Dependencies Yes All our experiments were run using MATLAB 8.1.0.604 (R2013a)... The exact algorithm uses MATLAB s svds function... The approximate algorithm exploits the tallthin structure of B and computes Y using MATLAB s svd function. The approximate algorithm uses MATLAB s normrnd function to generate the random Gaussian matrix S. We used MATLAB s kmeans function with the options Empty Action , singleton , Max Iter , 100, Replicates , 10.
Experiment Setup Yes To compute W, we use the heat kernel:Wij = e ( xi xj 2)/σij, where xi Rd and xj Rd are the data points and σij is a tuning parameter; σij is determined using the self-tuning method described in (Zelnik-Manor & Perona, 2004). That is, for each data point i, xi is computed to be the Euclidean distance of the ℓth furthest neighbor from i; then σij is set to be xixj for every (i, j); in our experiments, we report the results for ℓ= 7. ... We used MATLAB s kmeans function with the options Empty Action , singleton , Max Iter , 100, Replicates , 10. ... we varied p from 0 to 10.