LICO: Explainable Models with Language-Image COnsistency

Authors: Yiming Lei, Zilong Li, Yangyang Li, Junping Zhang, Hongming Shan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on eight benchmark datasets demonstrate that the proposed LICO achieves a significant improvement in generating more explainable attention maps in conjunction with existing interpretation methods such as Grad-CAM. Remarkably, LICO improves the classification performance of existing models without introducing any computational overhead during inference.
Researcher Affiliation Academia 1 Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University 2 Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3 Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University 4 Shanghai Center for Brain Science and Brain-inspired Technology
Pseudocode Yes Algorithm 1 Training Algorithm of LICO.
Open Source Code Yes Source code is made available at https://github.com/ym Lei FDU/LICO.
Open Datasets Yes This paper focuses on image classification task and evaluates the proposed LICO on well-known datasets, including Image Net-1k [32], CIFAR-10/100 [33], and SVHN [34].
Dataset Splits Yes We conduct the classification experiments under the setting of limited training data in which the splits of labeled data follow the previous works for fair comparison [35, 36].
Hardware Specification Yes The experiments were trained on four NVIDIA A100 GPUs for Image Net-1k and one GPU for other datasets.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number for it or other software dependencies.
Experiment Setup Yes The learning rates for Image Net, CIFAR10/100, and SVHN are of 0.03 with a consine rate decay schedule, i.e., η = η0 cos( 7πk 16K ), where η0 denotes the initial learning rate and k is the index of training step [46]. We use a standard stochastic gradient descent (SGD) optimizer with a momentum of 0.9 [47, 48], and the weight decay is 0.0001. The training batch sizes are 128 and 64 for Image Net and other datasets, respectively. Specifically, the mapping net for Res Net-50 is hψ[512, 49], hψ[512, 64] for WRN, and hψ[512, 49] for PARN-18. The total training epoch is 90 for Image Net and 200 for others.