Efficient Clustering for Stretched Mixtures: Landscape and Optimality

Authors: Kaizheng Wang, Yuling Yan, Mateo Diaz

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct numerical experiments on a real dataset. We randomly select N1 (resp. M1) T-shirts/tops and N2 (resp. M2) pullovers from the Fashion-MNIST (Xiao et al., 2017) training (resp. testing) dataset, each of which is a 28 28 grayscale image represented by a vector in [0, 1]28 28. The goal is clustering, i.e. learning from those N = N1 + N2 unlabeled images to predict the class labels of both N training samples and M = M1 + M2 testing samples.Figure 2 shows the learning curves of CURE over 50 independent trials. Even when the classes are unbalanced, CURE still reliably achieves low misclassification rates. Figure 3 presents histograms of testing data under the feature transform learned by the last (50th) trial of CURE, showing two seperated clusters around 1 corresponding to the two classes. To demonstrate the efficacy of CURE, we compare its misclassification rates with those of K-means and spectral methods on the training sets. We include the standard deviation over 50 independent trials for CURE due to its random initializations; other methods use the default settings (in Python) and thus are regarded as deterministic algorithms. As is shown in Table 1, CURE has the best performance under all settings.
Researcher Affiliation Academia Kaizheng Wang Columbia University kaizheng.wang@columbia.edu Yuling Yan Princeton University yulingy@princeton.edu Mateo Díaz Cornell University md825@cornell.edu
Pseudocode Yes Algorithm 1 Clustering via Uncoupled REgression (meta-algorithm) Input: Data {Xi}n i=1 in a feature space X, embedding space Y, target distribution nu over Y, discrepancy measure D, function class F, classification rule g. Embedding: find an approximation solution to min F D( # rn, nu). Output: Yi = g[ (Xi)] for i [n].Algorithm 2 Perturbed gradient descent Initialize 0 = 0. For t = 0, 1, . . . do If perturbation condition holds: Perturb t t + t with t U(B(0, r)) If termination condition holds: Return t Update t+1 t L1( t).
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository or mention code being available in supplementary materials.
Open Datasets Yes We randomly select N1 (resp. M1) T-shirts/tops and N2 (resp. M2) pullovers from the Fashion-MNIST (Xiao et al., 2017) training (resp. testing) dataset, each of which is a 28 28 grayscale image represented by a vector in [0, 1]28 28.
Dataset Splits No The paper mentions using 'Fashion-MNIST training (resp. testing) dataset' but does not specify any validation dataset splits or methods like cross-validation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions 'default settings (in Python)' for comparison methods but does not provide specific version numbers for Python or any other software libraries used in their own implementation.
Experiment Setup Yes We use gradient descent with random initialization from the unit sphere and learning rate 10^-3 (instead of perturbed gradient descent) to solve (7) as that requires less tuning.