Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
Authors: Aaron Jiaxun Li, Robin Netzorg, Zhihan Cheng, Zhuoqin Zhang, Bin Yu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that our R3 framework consistently improves both the interpretability and the predictive accuracy of Proto PNet and its variants. The source code of this work is available at https://github. com/aaron-jx-li/R3-Proto PNet. 4. Experiments 4.1. Bird Species Identification 4.1.1. Data Preprocessing 4.1.2. Implementation 4.1.3. Evaluation Metrics 4.1.4. Results |
| Researcher Affiliation | Academia | Aaron J. Li 1 Robin Netzorg 2 Zhihan Cheng 2 Zhuoqin Zhang 2 Bin Yu 2 1Harvard University 2University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 Reward Reweighed, Reselected, and Retrained Prototypical Part Network (R3-Proto PNet) |
| Open Source Code | Yes | The source code of this work is available at https://github. com/aaron-jx-li/R3-Proto PNet. |
| Open Datasets | Yes | With limited human feedback data on the Caltech-UCSD Birds-200-2011 (CUB-200-211) dataset (Welinder et al., 2010), we are able to train a high-quality reward model that achieves 90.1% test accuracy when ranking human prefer- The source code of this work is available at https://github. com/aaron-jx-li/R3-Proto PNet. In addition to bird species classification, we also conduct experiments on the Stanford Cars dataset (Krause et al., 2013). |
| Dataset Splits | No | The paper mentions a "train/test split" for the Stanford Cars dataset and states that for the CUB dataset, the initial training used the same dataset as Chen et al. (2019). While these imply splits, no explicit validation split percentages or counts are stated within this paper for the main model, nor is there a direct citation for the specific split used. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory details) used to run the experiments. |
| Software Dependencies | No | The paper mentions common deep learning components implicitly (e.g., CNNs, PyTorch for implementation), but it does not provide specific version numbers for any software dependencies like Python, PyTorch, TensorFlow, or specific libraries. |
| Experiment Setup | Yes | To offer better comparison against the original Proto PNet, we use the same dataset for initial training that was used in Chen et al. (2019), the CUB-200-2011 dataset (Wah et al., 2011). The Proto PNet injects interpretability into these convolutional architectures with the prototype layer gp, consisting of m prototypes P = {pj}m j=1 typically of size 1 1 D, where D is the shape of the convolutional output f(x). Assigning mk prototypes for all K classes, such that PK k=1 mk = m... In order to ensure that the prototypes match specific parts of training images, during training the prototype vectors are projected onto the closest patch in the training set. We note that the objective function Lreweigh is a sum of the inverse distances weighted by the reward of the prototype on that image. ...λdist is a fixed hyperparameter. We find best performance with λdist = 100. ...if 1 nk P i I(pj) r(xi, pj) < α, where α is a predetermined threshold... If 1 nk P i I(pj) r(x i, p j) > β, where β is an acceptance threshold...We found that varying the α and β values per base architecture led to the best performance (See Appendix B for threshold choices). At most 50 epochs are needed in this initial training step. The reward model r(xi, hi) is similar to the base architecture of the Proto PNet. Two Res Net-50 base CNNs take in the input image xi and the associated acticvation pattern hi separately, and both have two additional convolutional layers. ...We train the reward model for 5 epochs on a synthetic comparison dataset of 49K paired images and preference labels derived from 500 human ratings, and evaluate on 14K testing pairs. |