Less is More: Fewer Interpretable Region via Submodular Subset Selection

Authors: Ruoyu Chen, Hua Zhang, Siyuan Liang, Jingzhi Li, Xiaochun Cao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed method outperforms SOTA methods on two face datasets (Celeb-A and VGG-Face2) and one fine-grained dataset (CUB-200-2011).
Researcher Affiliation Academia Ruoyu Chen1,2, Hua Zhang1,2, , Siyuan Liang3, Jingzhi Li1,2, Xiaochun Cao4 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China {chenruoyu,zhanghua,lijingzhi}@iie.ac.cn 3School of Computing, National University of Singapore, 119077, Singapore pandaliang521@gmail.com 4School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China caoxiaochun@mail.sysu.edu.cn
Pseudocode Yes Algorithm 1: A greedy search based algorithm for interpretable region discovery
Open Source Code Yes The code is released at https://github.com/Ruoyu Chen10/SMDL-Attribution.
Open Datasets Yes We evaluate the proposed method on two face datasets Celeb-A (Liu et al., 2015) and VGG-Face2 (Cao et al., 2018), and a fine-grained dataset CUB-200-2011 (Welinder et al., 2010).
Dataset Splits Yes Celeb-A dataset includes 10, 177 IDs, we randomly select 2, 000 identities from Celeb-A s validation set... the VGG-Face2 dataset includes 8, 631 IDs, we randomly select 2, 000 identities from VGG-Face2 s validation set... CUB-200-2011 dataset... we select 3 samples for each class that is correctly predicted by the model from the CUB-200-2011 validation set for 200 classes...
Hardware Specification Yes These experiments were performed on an NVIDIA 3090 GPU.
Software Dependencies No The paper mentions using 'Xplique' but does not provide a specific version number. No other software dependencies with version numbers are listed.
Experiment Setup Yes For the two face datasets, we set N = 28 and m = 98. For the CUB-200-2011 dataset, we set N = 10 and m = 25. For the face datasets, we evaluated recognition models that were trained using the Res Net-101 (He et al., 2016) architecture and the Arc Face (Deng et al., 2019) loss function, with an input size of 112 112 pixels. For the CUB-200-2011 dataset, we evaluated a recognition model trained on the Res Net-101 architecture with a cross-entropy loss function and an input size of 224 224 pixels. To simplify parameter adjustment, all weighting coefficients are set to 1 by default.