q-means: A quantum algorithm for unsupervised machine learning
Authors: Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, Anupam Prakash
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present extensive simulations for different datasets and found that the number of iterations is practically the same as in the k-means, and the δ-k-means algorithm achieves an accuracy similar to the k-means algorithm, see Section 4. and 4 Simulations on real data We would like to demonstrate that the quantum algorithm provides accurate classification results. However, since neither quantum simulators nor quantum computers large enough to test q-means are available currently, we tested the equivalent classical implementation of δ-k-means, knowing that our quantum algorithms provides results consistent with the δ-k-means algorithm. For implementing the δ-k-means, we changed the assignment step of the k-means algorithm to select a random centroid among those that are δ-close to the closest centroid and added δ/2 error to the updated clusters. We benchmarked our q-means algorithm on two datasets: the well known MNIST dataset of handwritten digits and a synthetic dataset of gaussian clusters. To measure and compare the accuracy of our clustering algorithm, we ran the k-means and the δ-k-means algorithms for different values of δ on a training dataset and then we compared the accuracy of the classification on a test set, containing data points on which the algorithms have not been trained, using a number of widely-used performance measures. |
| Researcher Affiliation | Collaboration | Iordanis Kerenidis CNRS, IRIF, Université Paris Diderot, Paris, France jkeren@irif.fr Jonas Landman CNRS, IRIF, Universiteé Paris Diderot, Paris, France Ecole Polytechnique, Palaiseau, France. landman@irif.fr Alessandro Luongo CNRS, IRIF, Universiteé Paris Diderot, Paris, France Atos Quantum Lab Les Clayes-sous-Bois, France aluongo@irif.fr Anupam Prakash CNRS, IRIF, Universiteé Paris Diderot, Paris, France anupam.prakash@irif.fr |
| Pseudocode | Yes | Algorithm 1 q-means. Require: Data matrix V RN d stored in QRAM data structure. Precision parameters δ for k-means, error parameters ϵ1 for distance estimation, ϵ2 and ϵ3 for matrix multiplication and ϵ4 for tomography. Ensure: Outputs vectors c1, c2, , ck Rd that correspond to the centroids at the final step of the δ-k-means algorithm. Select k initial centroids c0 1, , c0 k and store them in QRAM data structure. t=0 repeat |
| Open Source Code | No | The paper does not provide any statement regarding the open-sourcing of the code or a link to a code repository. |
| Open Datasets | Yes | We benchmarked our q-means algorithm on two datasets: the well known MNIST dataset of handwritten digits and a synthetic dataset of gaussian clusters. and reference [26] 'Robert Layton. A demo of k-means clustering on the handwritten digits data, 1999. https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html.' |
| Dataset Splits | No | To measure and compare the accuracy of our clustering algorithm, we ran the k-means and the δ-k-means algorithms for different values of δ on a training dataset and then we compared the accuracy of the classification on a test set, containing data points on which the algorithms have not been trained, using a number of widely-used performance measures. No explicit mention of a validation set or specific split percentages for train/test/validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | As preprocessing, we first performed a Principal Component Analysis (PCA), retaining data projected in a subspace of dimension 40. After normalization, the value of η was 8.25 (maximum norm of 2.87), and the condition number for the data matrix was 4.53. and For implementing the δ-k-means, we changed the assignment step of the k-means algorithm to select a random centroid among those that are δ-close to the closest centroid and added δ/2 error to the updated clusters. The paper discusses 'different values of δ' but does not list specific hyperparameter values like learning rates or batch sizes. |