Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation

Authors: Yaoyuan Liang, Zhuojun Cai, Jian Xu, Guanbo Huang, Yiran Wang, Xiao Liang, Jiahao Liu, Ziran Li, Jingang Wang, Shao-Lun Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on the Ref COCOg and PHD benchmarks show that our proposed framework could outperform existing methods on both semantic and hallucination-related metrics.
Researcher Affiliation Collaboration 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Meituan Inc.
Pseudocode Yes Algorithm 1 Layer Prior Importance Calculation
Open Source Code Yes Code will be made available in https://github.com/Glupayy/unleash-eliminate.
Open Datasets Yes Extensive experiments conducted on the Ref COCOg [33] and PHD [28] benchmark
Dataset Splits Yes we randomly extracted K = 2000 samples from the Ref COCOg training set to form the triplets (I, M, Y)... Extensive experiments conducted on the Ref COCOg [33] and PHD [28] benchmark
Hardware Specification No The paper does not specify particular GPU or CPU models, memory, or other detailed hardware components used for the experiments. It only mentions 'GPUs' in the NeurIPS checklist response.
Software Dependencies No The paper mentions models like Osprey-7b and GLaMM, but it does not specify any software dependencies (e.g., Python, PyTorch, or specific library versions) with version numbers.
Experiment Setup Yes We included an analysis of the baseline region-level MLLM model, Osprey-7b, performing at both lower (t = 0.2) and higher (t = 0.9) temperature settings... we set α = 0.1 in the implementation... we randomly extracted K = 2000 samples... The first 32 layers (where layer 0 is the embedding layer) of the Osprey-7b model were organized into four groups: [0, 7], [8, 15], [16, 23], and [24, 31].