reproducibilityindex.ai

The Hidden Convexity of Spectral Clustering

Authors: James Voss, Mikhail Belkin, Luis Rademacher

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we propose a new class of algorithms for multiway spectral clustering based on optimization of a certain contrast function over the unit sphere. These algorithms, partly inspired by certain Indepenent Component Analysis techniques, are simple, easy to implement and efﬁcient. Geometrically, the proposed algorithms can be interpreted as hidden basis recovery by means of function optimization. We give a complete characterization of the contrast functions admissible for provable basis recovery. We show how these conditions can be interpreted as a hidden convexity of our optimization problem on the sphere; interestingly, we use efﬁcient convex maximization rather than the more common convex minimization. We also show encouraging experimental results on real and simulated data.
Researcher Affiliation	Academia	James Voss Ohio State University vossj@cse.ohio-state.edu Mikhail Belkin Ohio State University mbelkin@cse.ohio-state.edu Luis Rademacher Ohio State University lrademac@cse.ohio-state.edu
Pseudocode	Yes	1: function HBROPT(X, η) 2: C {} 3: for i 1 to m do 4: Draw u from Sm 1 span(C) uniformly at random. 5: repeat 6: u u + η( Fg(u) uu T Fg(u)) (= u + ηPu Fg(u)) 7: u Pspan(C) u 8: u u u 9: until Convergence 10: Let C C {u} 11: return C
Open Source Code	No	The paper mentions a link to the 'long version of this paper' (http://arxiv.org/abs/1403.0667) which contains proofs and stability results, but it does not state that source code for the methodology is available at this link or elsewhere.
Open Datasets	Yes	Performance Evaluation on UCI datasets. In particular, the E. coli, Flags, Glass, Thyroid Disease, and Car Evaluation data sets which are part of the UCI machine learning repository (Bache and Lichman 2013) are used. We also use the standardized gene expression data set (Yeung et al. 2001a; 2001b), which is also referred to as Cell Cycle. For the Flags data set, we used religion as the ground truth labels, and for Thyroid Disease, we used the new-thyroid data. ... BSDS300 test set (Martin et al. 2001).
Dataset Splits	No	The paper mentions using specific datasets and 'ground truth labels' but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations). It only mentions 'The reported results consist of the mean performance over a set of 25 runs for each algorithm'.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions software like 'spherical k-means' but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	The parameter α was chosen separately for each data set in order to create a good embedding. The choices of α were: 0.25 for E. Coli, 32 for Glass, 32 for Thyroid Disease, 128 for Flags, 0.25 for Car Evaluation, and 0.125 for Cell Cycle. The spectral embedding was performed using the symmetric normalized Laplacian Lsym.