Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Neuron Attributions in Multi-Modal Large Language Models

Authors: Junfeng Fang, Zac Bi, Ruipeng Wang, Houcheng Jiang, Yuan Gao, Kun Wang, An Zhang, Jie Shi, Xiang Wang, Tat-Seng Chua

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through theoretical analysis and empirical validation, we demonstrate the efficacy of NAM and the valuable insights it offers.
Researcher Affiliation Collaboration Junfeng Fang, Zongze Bi, Ruipeng Wang, Houcheng Jiang, Yuan Gao, Kun Wang University of Science and Technology of China EMAIL An Zhang National University of Singapore EMAIL Jie Shi Huawei EMAIL Xiang Wang University of Science and Technology of China EMAIL Tat-Seng Chua National University of Singapore EMAIL
Pseudocode No The paper describes methods and provides mathematical formulations (e.g., Equation 8) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/littlelittlenine/NAM_1.
Open Datasets Yes All experiments are conducted on the Common Objects in Context (COCO) [36], a large-scale object detection, segmentation, and captioning dataset including 80 object categories and five captions per image to conduct our experiments. For our experiments, we sourced the training and testing data for the COCO dataset directly from its website3.
Dataset Splits Yes All hyperparameter settings, such as the division of training and testing datasets, learning rate, and optimizer, are consistent with the original configurations of the above link unless otherwise stated.
Hardware Specification Yes Furthermore, we use Quadro RTX6000 GPUs with 24GB of memory as a representative example of consumer-level GPUs; 40GB A100s and 80GB H100s to provide datacenter-level benchmarks.
Software Dependencies No The paper mentions sourcing code for models like GILL, NExT-GPT, EVA02, and Diffuser Interpreter, and states that hyperparameter settings are consistent with their original configurations. However, it does not explicitly list specific version numbers for its own implementation's software dependencies (e.g., Python, PyTorch versions).
Experiment Setup Yes All hyperparameter settings, such as the division of training and testing datasets, learning rate, and optimizer, are consistent with the original configurations of the above link unless otherwise stated. Additionally, it is important to note that, unless explicitly mentioned, the samples used in the experiments were 500 images randomly selected from the COCO dataset.