LICO: Explainable Models with Language-Image COnsistency
Authors: Yiming Lei, Zilong Li, Yangyang Li, Junping Zhang, Hongming Shan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on eight benchmark datasets demonstrate that the proposed LICO achieves a significant improvement in generating more explainable attention maps in conjunction with existing interpretation methods such as Grad-CAM. Remarkably, LICO improves the classification performance of existing models without introducing any computational overhead during inference. |
| Researcher Affiliation | Academia | 1 Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University 2 Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3 Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University 4 Shanghai Center for Brain Science and Brain-inspired Technology |
| Pseudocode | Yes | Algorithm 1 Training Algorithm of LICO. |
| Open Source Code | Yes | Source code is made available at https://github.com/ym Lei FDU/LICO. |
| Open Datasets | Yes | This paper focuses on image classification task and evaluates the proposed LICO on well-known datasets, including Image Net-1k [32], CIFAR-10/100 [33], and SVHN [34]. |
| Dataset Splits | Yes | We conduct the classification experiments under the setting of limited training data in which the splits of labeled data follow the previous works for fair comparison [35, 36]. |
| Hardware Specification | Yes | The experiments were trained on four NVIDIA A100 GPUs for Image Net-1k and one GPU for other datasets. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number for it or other software dependencies. |
| Experiment Setup | Yes | The learning rates for Image Net, CIFAR10/100, and SVHN are of 0.03 with a consine rate decay schedule, i.e., η = η0 cos( 7πk 16K ), where η0 denotes the initial learning rate and k is the index of training step [46]. We use a standard stochastic gradient descent (SGD) optimizer with a momentum of 0.9 [47, 48], and the weight decay is 0.0001. The training batch sizes are 128 and 64 for Image Net and other datasets, respectively. Specifically, the mapping net for Res Net-50 is hψ[512, 49], hψ[512, 64] for WRN, and hψ[512, 49] for PARN-18. The total training epoch is 90 for Image Net and 200 for others. |