TVE: Learning Meta-attribution for Transferable Vision Explainer
Authors: Guanchu Wang, Yu-Neng Chuang, Fan Yang, Mengnan Du, Chia-Yuan Chang, Shaochen Zhong, Zirui Liu, Zhaozhuo Xu, Kaixiong Zhou, Xuanting Cai, Xia Hu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies involve explaining three different architectures of vision models across three diverse downstream datasets. The experimental results indicate TVE is effective in explaining these tasks without the need for additional training on downstream data. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Rice University 2Wake Forest University 3New Jersey Institute of Technology 4Texas A&M University 5Stevens Institute of Technology 6North Carolina State University 7Meta Platforms, Inc. |
| Pseudocode | Yes | Algorithm 1 summarizes one epoch of pre-training the transferable explainer E( | θ). |
| Open Source Code | Yes | The source code is available at https://github.com/guanchuwang/TVE. |
| Open Datasets | Yes | We consider the large-scale Image Net dataset (Deng et al., 2009) for TVE pre-training; and the Cats-vs-dogs (Elson et al., 2007), CIFAR-10 (Krizhevsky et al., 2009), and Imagenette (Howard, 2019) datasets for the downstream explaining tasks. Cats-vs-dogs (Elson et al., 2007): A dataset of cats and dogs images. It has 25000 training instances and 12500 testing instances. |
| Dataset Splits | No | The paper specifies training and testing instances for Cats-vs-dogs (25000 training instances and 12500 testing instances) and mentions using CIFAR-10 and Imagenette, which are standard benchmarks. However, it does not explicitly provide numerical splits (e.g., percentages or counts) for a separate validation set for all datasets, nor does it describe a specific methodology for creating such splits if they are not standard for the datasets used. |
| Hardware Specification | Yes | The computational infrastructure information is given in Table 9. Table 9. Computing infrastructure for the experiments. Device Attribute Value Computing Infrastructure GPU GPU Model NVIDIA-A5000 GPU Memory 24564MB GPU Number 8 CUDA Version 12.1 CPU Memory 512GB |
| Software Dependencies | No | The paper mentions software like Mask-Auto Encoder, Hugging Face library, and Captum, and CUDA Version 12.1. While CUDA has a version, other critical software components (e.g., PyTorch, Hugging Face, Captum) are not specified with explicit version numbers, which is required for full reproducibility of software dependencies. |
| Experiment Setup | Yes | Table 2. Hyper-parameters of fine-tuning the target model on downstream datasets. Optimizer ADAM Learning rate 2e-4 Mini-batch size 256 Scheduler Linear Warm-up-ratio 0.05 Weight-decay 0.05 Epoch 5. Table 4. Hyper-parameters of TVE pre-training on the Image Net dataset. Optimizer ADAM Learning rate 1e-3 Mini-batch size 64 per GPU 4 GPUs Scheduler Cosine Annealing LR Warm-up-ratio 0.05 Weight-decay 0.05 Training steps 2e5. |