reproducibilityindex.ai

Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining

Authors: Aaron Jiaxun Li, Robin Netzorg, Zhihan Cheng, Zhuoqin Zhang, Bin Yu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We find that our R3 framework consistently improves both the interpretability and the predictive accuracy of Proto PNet and its variants. The source code of this work is available at https://github. com/aaron-jx-li/R3-Proto PNet. 4. Experiments 4.1. Bird Species Identification 4.1.1. Data Preprocessing 4.1.2. Implementation 4.1.3. Evaluation Metrics 4.1.4. Results
Researcher Affiliation	Academia	Aaron J. Li 1 Robin Netzorg 2 Zhihan Cheng 2 Zhuoqin Zhang 2 Bin Yu 2 1Harvard University 2University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 Reward Reweighed, Reselected, and Retrained Prototypical Part Network (R3-Proto PNet)
Open Source Code	Yes	The source code of this work is available at https://github. com/aaron-jx-li/R3-Proto PNet.
Open Datasets	Yes	With limited human feedback data on the Caltech-UCSD Birds-200-2011 (CUB-200-211) dataset (Welinder et al., 2010), we are able to train a high-quality reward model that achieves 90.1% test accuracy when ranking human prefer- The source code of this work is available at https://github. com/aaron-jx-li/R3-Proto PNet. In addition to bird species classification, we also conduct experiments on the Stanford Cars dataset (Krause et al., 2013).
Dataset Splits	No	The paper mentions a "train/test split" for the Stanford Cars dataset and states that for the CUB dataset, the initial training used the same dataset as Chen et al. (2019). While these imply splits, no explicit validation split percentages or counts are stated within this paper for the main model, nor is there a direct citation for the specific split used.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory details) used to run the experiments.
Software Dependencies	No	The paper mentions common deep learning components implicitly (e.g., CNNs, PyTorch for implementation), but it does not provide specific version numbers for any software dependencies like Python, PyTorch, TensorFlow, or specific libraries.
Experiment Setup	Yes	To offer better comparison against the original Proto PNet, we use the same dataset for initial training that was used in Chen et al. (2019), the CUB-200-2011 dataset (Wah et al., 2011). The Proto PNet injects interpretability into these convolutional architectures with the prototype layer gp, consisting of m prototypes P = {pj}m j=1 typically of size 1 1 D, where D is the shape of the convolutional output f(x). Assigning mk prototypes for all K classes, such that PK k=1 mk = m... In order to ensure that the prototypes match specific parts of training images, during training the prototype vectors are projected onto the closest patch in the training set. We note that the objective function Lreweigh is a sum of the inverse distances weighted by the reward of the prototype on that image. ...λdist is a fixed hyperparameter. We find best performance with λdist = 100. ...if 1 nk P i I(pj) r(xi, pj) < α, where α is a predetermined threshold... If 1 nk P i I(pj) r(x i, p j) > β, where β is an acceptance threshold...We found that varying the α and β values per base architecture led to the best performance (See Appendix B for threshold choices). At most 50 epochs are needed in this initial training step. The reward model r(xi, hi) is similar to the base architecture of the Proto PNet. Two Res Net-50 base CNNs take in the input image xi and the associated acticvation pattern hi separately, and both have two additional convolutional layers. ...We train the reward model for 5 epochs on a synthetic comparison dataset of 49K paired images and preference labels derived from 500 human ratings, and evaluate on 14K testing pairs.