Supervising Unsupervised Learning
Authors: Vikas Garg, Adam T. Kalai
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted several experiments to substantiate the efficacy of the proposed framework under various unsupervised settings. We downloaded all classification datasets from Open ML (http://www.openml.org) that had at most 10,000 instances, 500 features, 10 classes, and no missing data to obtain a corpus of 339 datasets. We now describe in detail the results of our experiments. |
| Researcher Affiliation | Collaboration | Vikas K. Garg CSAIL, MIT vgarg@csail.mit.edu Adam Kalai Microsoft Research noreply@microsoft.com |
| Pseudocode | No | The paper describes algorithms verbally and provides theoretical results, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links to or explicit statements about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We downloaded all classification datasets from Open ML (http://www.openml.org) that had at most 10,000 instances, 500 features, 10 classes, and no missing data to obtain a corpus of 339 datasets. |
| Dataset Splits | No | The paper states that "We held out a fraction of the problems for test and used the remaining for training." and "The 250 problems were divided into train and test sets of varying sizes." but does not provide specific percentages or counts for these splits, nor does it explicitly mention a separate validation set split. |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments (e.g., GPU models, CPU types, or cloud infrastructure specifications). |
| Software Dependencies | No | The paper mentions using "scikit-learn" but does not provide specific version numbers for it or any other software dependencies crucial for reproducibility. |
| Experiment Setup | Yes | We ran each of the algorithms on the repository and see which algorithm has the lowest average error. ... The baselines are chosen to be five clustering algorithms from scikit-learn [27]: K-Means, Spectral, Agglomerative Single Linkage, Complete Linkage, and Ward, together with a second version of each in which each attribute is normalized to have zero mean and unit variance. Each algorithm is run with the default scikit-learn parameters. ... for each k {2, . . . , 9}, we fit ARI as a linear function of Silhouette scores... we removed a p {0, 0.01, 0.02, . . . , 0.05} fraction of examples with the highest euclidean norm in X as outliers |