reproducibilityindex.ai

Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example

Authors: Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah11423-11432

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use BAYES-TREX to study classiﬁers trained on CLEVR, MNIST, and Fashion-MNIST, and we show that this framework enables more ﬂexible holistic model analysis than just inspecting the test set.
Researcher Affiliation	Academia	Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah *Equal Contribution CSAIL, Massachusetts Institute of Technology {serenabooth, yilun, ajshah, julie a shah}@csail.mit.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and supplemental material are available at https://github.com/serenabooth/Bayes-Tr Ex.
Open Datasets	Yes	We use BAYES-TREX to study classiﬁers trained on CLEVR (Johnson et al. 2017), MNIST (Le Cun and Cortes 2010), and Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017)...
Dataset Splits	No	The paper refers to the 'test set' and implicitly uses validation (e.g., in hyperparameter tuning through the mention of 'CLEVR requires GPU-intensive rendering, so we stop after 500 samples. (Fashion-)MNIST samples are cheaper to generate, so we stop after 2,000 samples.'). However, it does not explicitly provide the specific percentages or counts for training, validation, and test dataset splits needed for full reproducibility of the data partitioning.
Hardware Specification	Yes	Empirically, we ﬁnd each sampling step takes 3.75 seconds for CLEVR, 1.18s for MNIST, and 1.96s for Fashion MNIST, all on a single NVIDIA Ge Force 1080 GPU.
Software Dependencies	No	We use the No-U-Turn sampler (Hoffman and Gelman 2014; Neal et al. 2011) implemented in the probabilistic programming language Pyro (Bingham et al. 2018). The paper mentions 'Pyro' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup	Yes	We choose σ = 0.05 for all experiments. CLEVR requires GPU-intensive rendering, so we stop after 500 samples. (Fashion-)MNIST samples are cheaper to generate, so we stop after 2,000 samples.