Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits
Authors: Mo Tiwari, Martin J. Zhang, James Mayclin, Sebastian Thrun, Chris Piech, Ilan Shomorony
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our results on several large real-world datasets, including a coding exercise submissions dataset from Code.org, the 10x Genomics 68k PBMC single-cell RNA sequencing dataset, and the MNIST handwritten digits dataset. |
| Researcher Affiliation | Academia | Mo Tiwari Department of Computer Science Stanford University EMAIL Martin Jinye Zhang Department of Epidemiology Harvard T.H. Chan School of Public Health EMAIL James Mayclin Department of Computer Science Stanford University EMAIL Sebastian Thrun Department of Computer Science Stanford University EMAIL Chris Piech Department of Computer Science Stanford University EMAIL Ilan Shomorony Electrical and Computer Engineering University of Illinois at Urbana-Champaign EMAIL |
| Pseudocode | Yes | Algorithm 1 Adaptive-Search ( Star, Sref, gx( ), B, δ, σx ) |
| Open Source Code | Yes | We also release highly optimized Python and C++ implementations of our algorithm1. 1https://github.com/Thrun Group/Bandit PAM |
| Open Datasets | Yes | The MNIST dataset [26] consists of 70,000 black-and-white images of handwritten digits... The HOC4 dataset from Code.org [11] consists of 3,360 unique solutions to a block-based programming exercise. [11] Code.org. Research at code.org. In https://code.org/research, 2013. |
| Dataset Splits | No | The paper discusses the datasets used but does not provide specific details on how they were split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions that the algorithm is implemented in Python and C++ but does not provide specific version numbers for these languages or any other software dependencies. |
| Experiment Setup | Yes | In all experiments, the batch size B is set to 100 and the error probability δ is set to δ = 1 1000|Star|. |