Scalable Laplacian K-modes
Authors: Imtiaz Ziko, Eric Granger, Ismail Ben Ayed
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report comprehensive experiments over various data sets, which show that our algorithm yields very competitive performances in term of optimization quality (i.e., the value of the discrete-variable objective at convergence) and clustering accuracy. |
| Researcher Affiliation | Academia | Imtiaz Masud Ziko ÉTS Montreal Eric Granger ÉTS Montreal Ismail Ben Ayed ÉTS Montreal |
| Pseudocode | Yes | Algorithm 1: SLK algorithm |
| Open Source Code | Yes | Code is available at: https://github.com/imtiazziko/SLK |
| Open Datasets | Yes | We used image datasets, except Shuttle and Reuters. The overall summary of the datasets is given in Table 1. For each dataset, imbalance is defined as the ratio of the size of the biggest cluster to the size of the smallest one. We use three versions of MNIST [17]. |
| Dataset Splits | Yes | We choose the best initial seed and regularization parameter λ empirically based on the accuracy over a validation set (10% of the total data). |
| Hardware Specification | Yes | All the experiments (our methods and the baselines) were conducted on a machine with Xeon E5-2620 CPU and a Titan X Pascal GPU. |
| Software Dependencies | No | The paper mentions using libraries like Flann but does not provide specific version numbers for software dependencies for their implementation. |
| Experiment Setup | Yes | In all of the datasets, we fixed ρ = 5. For the large datasets such as MNIST, Shuttle and Reuters, we used the Flann library [19] with the KD-tree algorithm, which finds approximate nearest neighbors. Mode estimation is based on the Gaussian kernel k(x, y) = e ( x y) 2/2σ2), with σ2 estimated as: σ2 = 1 Nρ Pxq N ρ p xp xq 2. Initial centers {m0 l }L l=1 are based on K-means++ seeds [1]. We choose the best initial seed and regularization parameter λ empirically based on the accuracy over a validation set (10% of the total data). The λ is determined from tuning in a small range from 1 to 4. |