Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video
Authors: Zenan Xu, Xiaojun Meng, Yasheng Wang, Qinliang Su, Zexuan Qiu, Xin Jiang, Qun Liu
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data. |
| Researcher Affiliation | Collaboration | Zenan Xu1 , Xiaojun Meng2 , Yasheng Wang2 , Qinliang Su1,4 , Zexuan Qiu3 , Xin Jiang2 and Qun Liu2 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2Noah s Ark Lab, Huawei Technologie 3The Chinese University of Hong Kong, Hong Kong SAR 4Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China {xuzn@mail2, suqliang@mail}.sysu.edu.cn, EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate the proposed ππππ on three public datasets, including How2, How2-300 [Sanabria et al., 2018], and MM-AVS [Fu et al., 2021] dataset. The statistic of datasets is shown in Table 1. |
| Dataset Splits | Yes | The statistic of datasets is shown in Table 1. (Table 1 provides 'Train Dev Test' splits for How2, How2-300, and MM-AVS datasets with specific counts, e.g., How2: 68336 Train, 2520 Dev, 2127 Test). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions 'BART-base model' as the backbone and 'Adam' as the optimizer, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages. |
| Experiment Setup | Yes | The BART-base model is adopted as the backbone of our model, in which πΏ= 6 for both the encoder and decoder. For the introduced auxiliary visual encoder, we use a 6-layer encoder with 8 attention heads and a 768 feed-forward dimension. Following previous work [Yu et al., 2021a], we set the max length of the generated summary to be 64 tokens; the decoding process can be stopped early if an End-of-Sequence (EOS) token is emitted. The Adam [Kingma and Ba, 2014] with π½1 = 0.9, π½2 = 0.999, and a weight decay of 1π 5 is employed as the optimizer. |