Exact Recovery of Mangled Clusters with Same-Cluster Queries

Authors: Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on large synthetic datasets confirm that we can reconstruct clusterings exactly and efficiently.
Researcher Affiliation Collaboration Marco Bressan Dept. of CS, Univ. of Milan, Italy marco.bressan@unimi.it Nicolò Cesa-Bianchi DSRC & Dept. of CS, Univ. of Milan, Italy nicolo.cesa-bianchi@unimi.it Silvio Lattanzi Google silviol@google.com Andrea Paudice Dept. of CS, Univ. of Milan, Italy & Istituto Italiano di Tecnologia, Italy andrea.paudice@unimi.it
Pseudocode Yes Algorithm 1 Tessellation Learn(X, SC, γ) and Algorithm 2 RECUR(X, k, γ, ε)
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets No The paper states, "We generated four synthetic instances on n = 105 points with increasing dimension d = 2, 4, 6, 8." This describes the generation of synthetic data but does not provide access to a public dataset with a link, citation, or repository information.
Dataset Splits No The paper does not explicitly describe training, validation, or test dataset splits. It mentions using synthetic datasets for experiments but no partitioning details.
Hardware Specification No The paper does not specify any hardware used for running the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No The paper does not provide any specific software names with version numbers or list any software dependencies.
Experiment Setup Yes The latent clusterings consist of k = 5 ellipsoidal clusters of equal size, each one with margin γ = 1 w.r.t. a random center and a random PSD matrix with condition number κ = 100, making each cluster stretched by 10 in a random direction. To account for an imperfect knowledge of the data, we fed RECUR with a value of γ = 10 (...) We also adopted for RECUR the batch sampling of SCQ-k-means, i.e., we draw k 10 samples in each round;