Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example

Authors: Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah11423-11432

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use BAYES-TREX to study classifiers trained on CLEVR, MNIST, and Fashion-MNIST, and we show that this framework enables more flexible holistic model analysis than just inspecting the test set.
Researcher Affiliation Academia Serena Booth*, Yilun Zhou*, Ankit Shah, Julie Shah *Equal Contribution CSAIL, Massachusetts Institute of Technology {serenabooth, yilun, ajshah, julie a shah}@csail.mit.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and supplemental material are available at https://github.com/serenabooth/Bayes-Tr Ex.
Open Datasets Yes We use BAYES-TREX to study classifiers trained on CLEVR (Johnson et al. 2017), MNIST (Le Cun and Cortes 2010), and Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017)...
Dataset Splits No The paper refers to the 'test set' and implicitly uses validation (e.g., in hyperparameter tuning through the mention of 'CLEVR requires GPU-intensive rendering, so we stop after 500 samples. (Fashion-)MNIST samples are cheaper to generate, so we stop after 2,000 samples.'). However, it does not explicitly provide the specific percentages or counts for training, validation, and test dataset splits needed for full reproducibility of the data partitioning.
Hardware Specification Yes Empirically, we find each sampling step takes 3.75 seconds for CLEVR, 1.18s for MNIST, and 1.96s for Fashion MNIST, all on a single NVIDIA Ge Force 1080 GPU.
Software Dependencies No We use the No-U-Turn sampler (Hoffman and Gelman 2014; Neal et al. 2011) implemented in the probabilistic programming language Pyro (Bingham et al. 2018). The paper mentions 'Pyro' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes We choose σ = 0.05 for all experiments. CLEVR requires GPU-intensive rendering, so we stop after 500 samples. (Fashion-)MNIST samples are cheaper to generate, so we stop after 2,000 samples.