reproducibilityindex.ai

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

Authors: Linhui Xiao, Xiaoshan Yang, Fang Peng, Yaowei Wang, Changsheng Xu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method is validated in the REC, RES, and PG tasks with five widely used datasets, namely three REC/RES datasets (Ref COCO/+/g [101, 62]), as well as two PG datasets (Refer It Game [34] and Flickr30k Entities [68]).
Researcher Affiliation	Academia	Linhui Xiao1,2,3, Xiaoshan Yang1,2,3, Fang Peng1,2,3, Yaowei Wang2,4, Changsheng Xu1,2,3 1MAIS, Institute of Automation, Chinese Academy of Sciences 2Pengcheng Laboratory 3School of Artificial Intelligence, University of Chinese Academy of Sciences 4Harbin Institute of Technology (Shenzhen)
Pseudocode	Yes	Algorithm 1 Referring-aware Dynamic Masking
Open Source Code	Yes	Our code and models are available at https://github.com/linhuixiao/One Ref.
Open Datasets	Yes	Our method is validated in the REC, RES, and PG tasks with five widely used datasets, namely three REC/RES datasets (Ref COCO/+/g [101, 62]), as well as two PG datasets (Refer It Game [34] and Flickr30k Entities [68]).
Dataset Splits	Yes	Table 1: Comparison with latest So TA methods on the five datasets for REC/PG tasks with singledataset fine-tuning setting. We highlight best result of base model in red and bold for large model. Methods Venue Visual Language Ref COCO Ref COCO+ Ref COCOg Refer It Flickr Backbone Backbone val test A test B val test A test B val test test test
Hardware Specification	Yes	For MRef M pre-training, the base model took 15 hours on 32 NVIDIA A100 GPUs, while the large model took 50 hours on the same number of GPUs. As for REC/RES transfer fine-tuning training, it took an average of 3 hours for the base model and 8 hours for the large model to process one dataset on 8 A100 GPUs.
Software Dependencies	No	The framework and experiments in our study were conducted using Py Torch. (No version specified for PyTorch). For NLP parsing, it mentions using 'spa Cy' but without a version number.
Experiment Setup	Yes	The batch size for pre-training the base model and large model are (32, 8), while they are (32, 8) and (16, 6) for transferring to the REC and RES tasks, respectively. Our model is optimized end-to-end by using the Adam W optimizer and a cosine learning scheduler with an initial learning rate of 0.5 10 4 for 110 epochs during the pre-training stage. During REC/RES transfer stage, the learning rates is 0.3 10 4 with 20 epochs.