Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Neuron Attributions in Multi-Modal Large Language Models
Authors: Junfeng Fang, Zac Bi, Ruipeng Wang, Houcheng Jiang, Yuan Gao, Kun Wang, An Zhang, Jie Shi, Xiang Wang, Tat-Seng Chua
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through theoretical analysis and empirical validation, we demonstrate the efficacy of NAM and the valuable insights it offers. |
| Researcher Affiliation | Collaboration | Junfeng Fang, Zongze Bi, Ruipeng Wang, Houcheng Jiang, Yuan Gao, Kun Wang University of Science and Technology of China EMAIL An Zhang National University of Singapore EMAIL Jie Shi Huawei EMAIL Xiang Wang University of Science and Technology of China EMAIL Tat-Seng Chua National University of Singapore EMAIL |
| Pseudocode | No | The paper describes methods and provides mathematical formulations (e.g., Equation 8) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/littlelittlenine/NAM_1. |
| Open Datasets | Yes | All experiments are conducted on the Common Objects in Context (COCO) [36], a large-scale object detection, segmentation, and captioning dataset including 80 object categories and five captions per image to conduct our experiments. For our experiments, we sourced the training and testing data for the COCO dataset directly from its website3. |
| Dataset Splits | Yes | All hyperparameter settings, such as the division of training and testing datasets, learning rate, and optimizer, are consistent with the original configurations of the above link unless otherwise stated. |
| Hardware Specification | Yes | Furthermore, we use Quadro RTX6000 GPUs with 24GB of memory as a representative example of consumer-level GPUs; 40GB A100s and 80GB H100s to provide datacenter-level benchmarks. |
| Software Dependencies | No | The paper mentions sourcing code for models like GILL, NExT-GPT, EVA02, and Diffuser Interpreter, and states that hyperparameter settings are consistent with their original configurations. However, it does not explicitly list specific version numbers for its own implementation's software dependencies (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | All hyperparameter settings, such as the division of training and testing datasets, learning rate, and optimizer, are consistent with the original configurations of the above link unless otherwise stated. Additionally, it is important to note that, unless explicitly mentioned, the samples used in the experiments were 500 images randomly selected from the COCO dataset. |