The Hidden Convexity of Spectral Clustering
Authors: James Voss, Mikhail Belkin, Luis Rademacher
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we propose a new class of algorithms for multiway spectral clustering based on optimization of a certain contrast function over the unit sphere. These algorithms, partly inspired by certain Indepenent Component Analysis techniques, are simple, easy to implement and efficient. Geometrically, the proposed algorithms can be interpreted as hidden basis recovery by means of function optimization. We give a complete characterization of the contrast functions admissible for provable basis recovery. We show how these conditions can be interpreted as a hidden convexity of our optimization problem on the sphere; interestingly, we use efficient convex maximization rather than the more common convex minimization. We also show encouraging experimental results on real and simulated data. |
| Researcher Affiliation | Academia | James Voss Ohio State University vossj@cse.ohio-state.edu Mikhail Belkin Ohio State University mbelkin@cse.ohio-state.edu Luis Rademacher Ohio State University lrademac@cse.ohio-state.edu |
| Pseudocode | Yes | 1: function HBROPT(X, η) 2: C {} 3: for i 1 to m do 4: Draw u from Sm 1 span(C) uniformly at random. 5: repeat 6: u u + η( Fg(u) uu T Fg(u)) (= u + ηPu Fg(u)) 7: u Pspan(C) u 8: u u u 9: until Convergence 10: Let C C {u} 11: return C |
| Open Source Code | No | The paper mentions a link to the 'long version of this paper' (http://arxiv.org/abs/1403.0667) which contains proofs and stability results, but it does not state that source code for the methodology is available at this link or elsewhere. |
| Open Datasets | Yes | Performance Evaluation on UCI datasets. In particular, the E. coli, Flags, Glass, Thyroid Disease, and Car Evaluation data sets which are part of the UCI machine learning repository (Bache and Lichman 2013) are used. We also use the standardized gene expression data set (Yeung et al. 2001a; 2001b), which is also referred to as Cell Cycle. For the Flags data set, we used religion as the ground truth labels, and for Thyroid Disease, we used the new-thyroid data. ... BSDS300 test set (Martin et al. 2001). |
| Dataset Splits | No | The paper mentions using specific datasets and 'ground truth labels' but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations). It only mentions 'The reported results consist of the mean performance over a set of 25 runs for each algorithm'. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'spherical k-means' but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The parameter α was chosen separately for each data set in order to create a good embedding. The choices of α were: 0.25 for E. Coli, 32 for Glass, 32 for Thyroid Disease, 128 for Flags, 0.25 for Car Evaluation, and 0.125 for Cell Cycle. The spectral embedding was performed using the symmetric normalized Laplacian Lsym. |