Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability

Authors: Zhiyu Zhu, Zhibo Jin, Jiayu Zhang, Nan Yang, Jiahao Huang, Jianlong Zhou, Fang Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, compared to state-of-the-art methods, our approach enhances image interpretability by an average of 9%, text interpretability by an average of 58.83%, and accelerates processing speed by 63.95%. Our code is publicly accessible at https://github.com/LMBTough/NIB.
Researcher Affiliation Collaboration Zhiyu Zhu1, Zhibo Jin1, Jiayu Zhang2, Nan Yang3, Jiahao Huang3 , Jianlong Zhou1 & Fang Chen1 University of Technology Sydney1, Su Zhou Yierqi2, University of Sydney3
Pseudocode No The paper describes the NIB method and presents theoretical derivations but does not provide a structured pseudocode block or algorithm.
Open Source Code Yes Our code is publicly accessible at https://github.com/LMBTough/NIB. ... Furthermore, the code for implementing our proposed Narrowing Information Bottleneck Theory (NIBT) and the associated datasets are available in the Anonymous Repository2. These resources should enable the community to reproduce our findings and apply the methods to their own work. (2https://anonymous.4open.science/r/NIB-DBCD/)
Open Datasets Yes We conduct experiments on three different datasets: Conceptual Captions (Sharma et al., 2018), Image Net (Deng et al., 2009), and Flickr8k (Hodosh et al., 2013).
Dataset Splits No The paper mentions using Conceptual Captions, Image Net, and Flickr8k datasets but does not explicitly provide details about the specific training, validation, and test splits used for these datasets within the paper. It only mentions following the experimental setup of M2IB.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions using the pretrained CLIP model with a Vision Transformer (Vi T-B/32) but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We reduced the number of parameters required by the IBP-based method while retaining the core hyperparameters used during the generation of saliency maps, including the number of iterations (num steps) and the layer number. ... In our experiments, num steps is set to 10... In this study, we selected the 9th layer (layer number = 9)...