Supervising Unsupervised Learning

Authors: Vikas Garg, Adam T. Kalai

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted several experiments to substantiate the efficacy of the proposed framework under various unsupervised settings. We downloaded all classification datasets from Open ML (http://www.openml.org) that had at most 10,000 instances, 500 features, 10 classes, and no missing data to obtain a corpus of 339 datasets. We now describe in detail the results of our experiments.
Researcher Affiliation Collaboration Vikas K. Garg CSAIL, MIT vgarg@csail.mit.edu Adam Kalai Microsoft Research noreply@microsoft.com
Pseudocode No The paper describes algorithms verbally and provides theoretical results, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any links to or explicit statements about the availability of open-source code for the described methodology.
Open Datasets Yes We downloaded all classification datasets from Open ML (http://www.openml.org) that had at most 10,000 instances, 500 features, 10 classes, and no missing data to obtain a corpus of 339 datasets.
Dataset Splits No The paper states that "We held out a fraction of the problems for test and used the remaining for training." and "The 250 problems were divided into train and test sets of varying sizes." but does not provide specific percentages or counts for these splits, nor does it explicitly mention a separate validation set split.
Hardware Specification No The paper does not specify any hardware used for running the experiments (e.g., GPU models, CPU types, or cloud infrastructure specifications).
Software Dependencies No The paper mentions using "scikit-learn" but does not provide specific version numbers for it or any other software dependencies crucial for reproducibility.
Experiment Setup Yes We ran each of the algorithms on the repository and see which algorithm has the lowest average error. ... The baselines are chosen to be five clustering algorithms from scikit-learn [27]: K-Means, Spectral, Agglomerative Single Linkage, Complete Linkage, and Ward, together with a second version of each in which each attribute is normalized to have zero mean and unit variance. Each algorithm is run with the default scikit-learn parameters. ... for each k {2, . . . , 9}, we fit ARI as a linear function of Silhouette scores... we removed a p {0, 0.01, 0.02, . . . , 0.05} fraction of examples with the highest euclidean norm in X as outliers