Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation
Authors: Yaoyuan Liang, Zhuojun Cai, Jian Xu, Guanbo Huang, Yiran Wang, Xiao Liang, Jiahao Liu, Ziran Li, Jingang Wang, Shao-Lun Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on the Ref COCOg and PHD benchmarks show that our proposed framework could outperform existing methods on both semantic and hallucination-related metrics. |
| Researcher Affiliation | Collaboration | 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Meituan Inc. |
| Pseudocode | Yes | Algorithm 1 Layer Prior Importance Calculation |
| Open Source Code | Yes | Code will be made available in https://github.com/Glupayy/unleash-eliminate. |
| Open Datasets | Yes | Extensive experiments conducted on the Ref COCOg [33] and PHD [28] benchmark |
| Dataset Splits | Yes | we randomly extracted K = 2000 samples from the Ref COCOg training set to form the triplets (I, M, Y)... Extensive experiments conducted on the Ref COCOg [33] and PHD [28] benchmark |
| Hardware Specification | No | The paper does not specify particular GPU or CPU models, memory, or other detailed hardware components used for the experiments. It only mentions 'GPUs' in the NeurIPS checklist response. |
| Software Dependencies | No | The paper mentions models like Osprey-7b and GLaMM, but it does not specify any software dependencies (e.g., Python, PyTorch, or specific library versions) with version numbers. |
| Experiment Setup | Yes | We included an analysis of the baseline region-level MLLM model, Osprey-7b, performing at both lower (t = 0.2) and higher (t = 0.9) temperature settings... we set α = 0.1 in the implementation... we randomly extracted K = 2000 samples... The first 32 layers (where layer 0 is the embedding layer) of the Osprey-7b model were organized into four groups: [0, 7], [8, 15], [16, 23], and [24, 31]. |