Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
Authors: Zhiyu Zhu, Zhibo Jin, Jiayu Zhang, Nan Yang, Jiahao Huang, Jianlong Zhou, Fang Chen
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, compared to state-of-the-art methods, our approach enhances image interpretability by an average of 9%, text interpretability by an average of 58.83%, and accelerates processing speed by 63.95%. Our code is publicly accessible at https://github.com/LMBTough/NIB. |
| Researcher Affiliation | Collaboration | Zhiyu Zhu1, Zhibo Jin1, Jiayu Zhang2, Nan Yang3, Jiahao Huang3 , Jianlong Zhou1 & Fang Chen1 University of Technology Sydney1, Su Zhou Yierqi2, University of Sydney3 |
| Pseudocode | No | The paper describes the NIB method and presents theoretical derivations but does not provide a structured pseudocode block or algorithm. |
| Open Source Code | Yes | Our code is publicly accessible at https://github.com/LMBTough/NIB. ... Furthermore, the code for implementing our proposed Narrowing Information Bottleneck Theory (NIBT) and the associated datasets are available in the Anonymous Repository2. These resources should enable the community to reproduce our findings and apply the methods to their own work. (2https://anonymous.4open.science/r/NIB-DBCD/) |
| Open Datasets | Yes | We conduct experiments on three different datasets: Conceptual Captions (Sharma et al., 2018), Image Net (Deng et al., 2009), and Flickr8k (Hodosh et al., 2013). |
| Dataset Splits | No | The paper mentions using Conceptual Captions, Image Net, and Flickr8k datasets but does not explicitly provide details about the specific training, validation, and test splits used for these datasets within the paper. It only mentions following the experimental setup of M2IB. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using the pretrained CLIP model with a Vision Transformer (Vi T-B/32) but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We reduced the number of parameters required by the IBP-based method while retaining the core hyperparameters used during the generation of saliency maps, including the number of iterations (num steps) and the layer number. ... In our experiments, num steps is set to 10... In this study, we selected the 9th layer (layer number = 9)... |