Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation

Authors: Yaoyuan Liang, Zhuojun Cai, Jian Xu, Guanbo Huang, Yiran Wang, Xiao Liang, Jiahao Liu, Ziran Li, Jingang Wang, Shao-Lun Huang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on the Ref COCOg and PHD benchmarks show that our proposed framework could outperform existing methods on both semantic and hallucination-related metrics.
Researcher Affiliation Collaboration 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Meituan Inc.
Pseudocode Yes Algorithm 1 Layer Prior Importance Calculation
Open Source Code Yes Code will be made available in https://github.com/Glupayy/unleash-eliminate.
Open Datasets Yes Extensive experiments conducted on the Ref COCOg [33] and PHD [28] benchmark
Dataset Splits Yes we randomly extracted K = 2000 samples from the Ref COCOg training set to form the triplets (I, M, Y)... Extensive experiments conducted on the Ref COCOg [33] and PHD [28] benchmark
Hardware Specification No The paper does not specify particular GPU or CPU models, memory, or other detailed hardware components used for the experiments. It only mentions 'GPUs' in the NeurIPS checklist response.
Software Dependencies No The paper mentions models like Osprey-7b and GLaMM, but it does not specify any software dependencies (e.g., Python, PyTorch, or specific library versions) with version numbers.
Experiment Setup Yes We included an analysis of the baseline region-level MLLM model, Osprey-7b, performing at both lower (t = 0.2) and higher (t = 0.9) temperature settings... we set α = 0.1 in the implementation... we randomly extracted K = 2000 samples... The first 32 layers (where layer 0 is the embedding layer) of the Osprey-7b model were organized into four groups: [0, 7], [8, 15], [16, 23], and [24, 31].