reproducibilityindex.ai

The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning

Authors: Artyom Gadetsky, Maria Brbic

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of HUME on three commonly used clustering benchmarks, including the STL-10 [32], CIFAR-10 and CIFAR-100-20 [33] datasets. In addition, we also compare HUME to large-scale unsupervised baselines on the fine-grained Image Net-1000 dataset [34].
Researcher Affiliation	Academia	Artyom Gadetsky EPFL artem.gadetskii@epfl.ch Maria Brbi c EPFL mbrbic@epfl.ch
Pseudocode	Yes	The pseudocode of the algorithm is shown in Algorithm 1.
Open Source Code	Yes	Code is publicly available at https: //github.com/mlbio-epfl/hume.
Open Datasets	Yes	We evaluate the performance of HUME on three commonly used clustering benchmarks, including the STL-10 [32], CIFAR-10 and CIFAR-100-20 [33] datasets. In addition, we also compare HUME to large-scale unsupervised baselines on the fine-grained Image Net-1000 dataset [34].
Dataset Splits	No	The paper describes random sampling of disjoint train (Xtr) and test (Xte) subsets for each iteration, and evaluates performance on Xte. While 'cross-validation accuracy' is mentioned, it refers to evaluation on the Xte (test) split within their process, not a distinct validation set.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or detailed cluster specifications) are mentioned for running the experiments.
Software Dependencies	No	The paper mentions software like PyTorch [3] and Adam optimizer [4], but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup	Yes	In all experiments, we use the following hyperapameters: number of iterations T = 1000, Adam optimizer [4] with step size α = 0.001 and temperature of the sparsemax activation function γ = 0.1. We anneal temperature and step size by 10 after 100 and 200 iterations. We set regularization parameter η to value 10 in all experiments and we show ablation for this hyperparameter in Appendix B. To solve inner optimization problem we run gradient descent for 300 iterations with step size equal to 0.001. At each iteration we sample without replacement 10000 examples from the dataset to construct subset (Xtr, Xte), \|Xtr\| = 9000, \|Xte\| = 1000.