Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example
Authors: Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah11423-11432
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use BAYES-TREX to study classifiers trained on CLEVR, MNIST, and Fashion-MNIST, and we show that this framework enables more flexible holistic model analysis than just inspecting the test set. |
| Researcher Affiliation | Academia | Serena Booth*, Yilun Zhou*, Ankit Shah, Julie Shah *Equal Contribution CSAIL, Massachusetts Institute of Technology {serenabooth, yilun, ajshah, julie a shah}@csail.mit.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and supplemental material are available at https://github.com/serenabooth/Bayes-Tr Ex. |
| Open Datasets | Yes | We use BAYES-TREX to study classifiers trained on CLEVR (Johnson et al. 2017), MNIST (Le Cun and Cortes 2010), and Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017)... |
| Dataset Splits | No | The paper refers to the 'test set' and implicitly uses validation (e.g., in hyperparameter tuning through the mention of 'CLEVR requires GPU-intensive rendering, so we stop after 500 samples. (Fashion-)MNIST samples are cheaper to generate, so we stop after 2,000 samples.'). However, it does not explicitly provide the specific percentages or counts for training, validation, and test dataset splits needed for full reproducibility of the data partitioning. |
| Hardware Specification | Yes | Empirically, we find each sampling step takes 3.75 seconds for CLEVR, 1.18s for MNIST, and 1.96s for Fashion MNIST, all on a single NVIDIA Ge Force 1080 GPU. |
| Software Dependencies | No | We use the No-U-Turn sampler (Hoffman and Gelman 2014; Neal et al. 2011) implemented in the probabilistic programming language Pyro (Bingham et al. 2018). The paper mentions 'Pyro' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We choose σ = 0.05 for all experiments. CLEVR requires GPU-intensive rendering, so we stop after 500 samples. (Fashion-)MNIST samples are cheaper to generate, so we stop after 2,000 samples. |