GL-RG: Global-Local Representation Granularity for Video Captioning
Authors: Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the challenging MSR-VTT and MSVD datasets show that our DL-RG outperforms recent state-of-the-art methods by a significant margin. |
| Researcher Affiliation | Collaboration | Liqi Yan1,2,8 , Qifan Wang3 , Yiming Cui4 , Fuli Feng5 , Xiaojun Quan6 , Xiangyu Zhang7 and Dongfang Liu8 1Fudan University 2Westlake University 3Meta AI 4University of Florida 5University of Science and Technology of China 6Sun Yat-sen University 7Purdue University 8Rochester Institute of Technology |
| Pseudocode | No | The paper includes equations and architectural diagrams, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ylqi/GL-RG. |
| Open Datasets | Yes | We evaluate our GL-RG on MSR-VTT dataset [Xu et al., 2016]. We also evaluate our GL-RG on the MSVD dataset [Chen and Dolan, 2011]. |
| Dataset Splits | Yes | We follow the data split of 6513 videos for training, 497 videos for validation, and 2990 videos for testing. We split the dataset into a 1,200 training set, 100 validation set, and 670 testing set by the contiguous index number. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models, memory, or processing units. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the implementation, only mentioning the use of pre-trained models on certain datasets. |
| Experiment Setup | Yes | Our decoder is trained with the learning rate of 0.0003 in the seeding phase, and 0.0001 in the boosting phase. For each video, training is operated on 20 or 17 ground-truth captions for MSR-VTT or MSVD respectively. |