Efficient Clustering for Stretched Mixtures: Landscape and Optimality
Authors: Kaizheng Wang, Yuling Yan, Mateo Diaz
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct numerical experiments on a real dataset. We randomly select N1 (resp. M1) T-shirts/tops and N2 (resp. M2) pullovers from the Fashion-MNIST (Xiao et al., 2017) training (resp. testing) dataset, each of which is a 28 28 grayscale image represented by a vector in [0, 1]28 28. The goal is clustering, i.e. learning from those N = N1 + N2 unlabeled images to predict the class labels of both N training samples and M = M1 + M2 testing samples.Figure 2 shows the learning curves of CURE over 50 independent trials. Even when the classes are unbalanced, CURE still reliably achieves low misclassification rates. Figure 3 presents histograms of testing data under the feature transform learned by the last (50th) trial of CURE, showing two seperated clusters around 1 corresponding to the two classes. To demonstrate the efficacy of CURE, we compare its misclassification rates with those of K-means and spectral methods on the training sets. We include the standard deviation over 50 independent trials for CURE due to its random initializations; other methods use the default settings (in Python) and thus are regarded as deterministic algorithms. As is shown in Table 1, CURE has the best performance under all settings. |
| Researcher Affiliation | Academia | Kaizheng Wang Columbia University kaizheng.wang@columbia.edu Yuling Yan Princeton University yulingy@princeton.edu Mateo Díaz Cornell University md825@cornell.edu |
| Pseudocode | Yes | Algorithm 1 Clustering via Uncoupled REgression (meta-algorithm) Input: Data {Xi}n i=1 in a feature space X, embedding space Y, target distribution nu over Y, discrepancy measure D, function class F, classification rule g. Embedding: find an approximation solution to min F D( # rn, nu). Output: Yi = g[ (Xi)] for i [n].Algorithm 2 Perturbed gradient descent Initialize 0 = 0. For t = 0, 1, . . . do If perturbation condition holds: Perturb t t + t with t U(B(0, r)) If termination condition holds: Return t Update t+1 t L1( t). |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository or mention code being available in supplementary materials. |
| Open Datasets | Yes | We randomly select N1 (resp. M1) T-shirts/tops and N2 (resp. M2) pullovers from the Fashion-MNIST (Xiao et al., 2017) training (resp. testing) dataset, each of which is a 28 28 grayscale image represented by a vector in [0, 1]28 28. |
| Dataset Splits | No | The paper mentions using 'Fashion-MNIST training (resp. testing) dataset' but does not specify any validation dataset splits or methods like cross-validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'default settings (in Python)' for comparison methods but does not provide specific version numbers for Python or any other software libraries used in their own implementation. |
| Experiment Setup | Yes | We use gradient descent with random initialization from the unit sphere and learning rate 10^-3 (instead of perturbed gradient descent) to solve (7) as that requires less tuning. |