q-means: A quantum algorithm for unsupervised machine learning

Authors: Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, Anupam Prakash

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present extensive simulations for different datasets and found that the number of iterations is practically the same as in the k-means, and the δ-k-means algorithm achieves an accuracy similar to the k-means algorithm, see Section 4. and 4 Simulations on real data We would like to demonstrate that the quantum algorithm provides accurate classification results. However, since neither quantum simulators nor quantum computers large enough to test q-means are available currently, we tested the equivalent classical implementation of δ-k-means, knowing that our quantum algorithms provides results consistent with the δ-k-means algorithm. For implementing the δ-k-means, we changed the assignment step of the k-means algorithm to select a random centroid among those that are δ-close to the closest centroid and added δ/2 error to the updated clusters. We benchmarked our q-means algorithm on two datasets: the well known MNIST dataset of handwritten digits and a synthetic dataset of gaussian clusters. To measure and compare the accuracy of our clustering algorithm, we ran the k-means and the δ-k-means algorithms for different values of δ on a training dataset and then we compared the accuracy of the classification on a test set, containing data points on which the algorithms have not been trained, using a number of widely-used performance measures.
Researcher Affiliation Collaboration Iordanis Kerenidis CNRS, IRIF, Université Paris Diderot, Paris, France jkeren@irif.fr Jonas Landman CNRS, IRIF, Universiteé Paris Diderot, Paris, France Ecole Polytechnique, Palaiseau, France. landman@irif.fr Alessandro Luongo CNRS, IRIF, Universiteé Paris Diderot, Paris, France Atos Quantum Lab Les Clayes-sous-Bois, France aluongo@irif.fr Anupam Prakash CNRS, IRIF, Universiteé Paris Diderot, Paris, France anupam.prakash@irif.fr
Pseudocode Yes Algorithm 1 q-means. Require: Data matrix V RN d stored in QRAM data structure. Precision parameters δ for k-means, error parameters ϵ1 for distance estimation, ϵ2 and ϵ3 for matrix multiplication and ϵ4 for tomography. Ensure: Outputs vectors c1, c2, , ck Rd that correspond to the centroids at the final step of the δ-k-means algorithm. Select k initial centroids c0 1, , c0 k and store them in QRAM data structure. t=0 repeat
Open Source Code No The paper does not provide any statement regarding the open-sourcing of the code or a link to a code repository.
Open Datasets Yes We benchmarked our q-means algorithm on two datasets: the well known MNIST dataset of handwritten digits and a synthetic dataset of gaussian clusters. and reference [26] 'Robert Layton. A demo of k-means clustering on the handwritten digits data, 1999. https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html.'
Dataset Splits No To measure and compare the accuracy of our clustering algorithm, we ran the k-means and the δ-k-means algorithms for different values of δ on a training dataset and then we compared the accuracy of the classification on a test set, containing data points on which the algorithms have not been trained, using a number of widely-used performance measures. No explicit mention of a validation set or specific split percentages for train/test/validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No As preprocessing, we first performed a Principal Component Analysis (PCA), retaining data projected in a subspace of dimension 40. After normalization, the value of η was 8.25 (maximum norm of 2.87), and the condition number for the data matrix was 4.53. and For implementing the δ-k-means, we changed the assignment step of the k-means algorithm to select a random centroid among those that are δ-close to the closest centroid and added δ/2 error to the updated clusters. The paper discusses 'different values of δ' but does not list specific hyperparameter values like learning rates or batch sizes.