reproducibilityindex.ai

Building a stable classifier with the inflated argmax

Authors: Jake Soloff, Rina Barber, Rebecca Willett

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments In this section, we evaluate our proposed pipeline, combining subbagging with the inflated argmax, with deep learning models and on a common benchmark data set.4 Data and models. We use Fashion-MNIST [XRV17], which consists of n = 60, 000 training pairs (Xi, Yi), N = 10, 000 test pairs ( Xj, Yj), and L = 10 classes. For each data point (X, Y ), X is a 28 28 grayscale image that pictures a clothing item, and Y [L] indicates the type of item, e.g., a dress, a coat, etc. The base model we use is a variant of Le Net-5, implemented in Py Torch [PGML19] tutorials as Garment Classifier(). The base algorithm A trains this classifier using 5 epochs of stochastic gradient descent. Methods and evaluation. We compare four methods:
Researcher Affiliation	Academia	Jake A. Soloff Department of Statistics University of Chicago Chicago, IL 60637 soloff@uchicago.edu Rina Foygel Barber Department of Statistics University of Chicago Chicago, IL 60637 rina@uchicago.edu Rebecca Willett Departments of Statistics and Computer Science NSF-Simons National Institute for Theory and Mathematics in Biology University of Chicago Chicago, IL 60637 willett@uchicago.edu
Pseudocode	No	No explicit pseudocode or algorithm blocks were found. The paper describes its methods verbally and mathematically.
Open Source Code	Yes	Code to fully reproduce the experiment is available at https://github.com/jake-soloff/stable-argmax-experiments. ... We attach our code in our submission to Open Review, and we will deanonymize the link to the Github repository after the review process.
Open Datasets	Yes	We use Fashion-MNIST [XRV17], which consists of n = 60, 000 training pairs (Xi, Yi), N = 10, 000 test pairs ( Xj, Yj), and L = 10 classes.
Dataset Splits	No	The paper mentions 60,000 training pairs and 10,000 test pairs, but no explicit validation set split or methodology for it.
Hardware Specification	No	Training all of the models for this experiment took a total of four hours on 10 CPUs running in parallel on a single computing cluster. The statement mentions "10 CPUs" and "single computing cluster" but lacks specific models (e.g., Intel Xeon, specific series), memory, or clock speed to be considered a detailed specification.
Software Dependencies	No	The base model we use is a variant of Le Net-5, implemented in Py Torch [PGML19] tutorials as Garment Classifier(). PyTorch is mentioned, but no specific version number.
Experiment Setup	Yes	The base algorithm A trains this classifier using 5 epochs of stochastic gradient descent. ... The ε-inflated argmax of the base learning algorithm A with tolerance ε = .05. ... The argmax of the subbagged algorithm e Am, with B = 1, 000 bags of size m = n/2. ... and tolerance ε = .05.