reproducibilityindex.ai

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

Authors: Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei8518-8526

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the compelling generalizability of our pretrained encoder-decoder by ﬁne-tuning on four VL understanding and generation downstream tasks. Through an extensive set of experiments on four VL understanding and generation downstream tasks, we demonstrate that our pre-trained TDEN achieves new state-of-the-art performances for each task.
Researcher Affiliation	Collaboration	1 JD AI Research, Beijing, China 2 Sun Yat-sen University, Guangzhou, China
Pseudocode	No	The paper describes the architecture and processes, but does not include any explicitly labeled pseudocode blocks or algorithms formatted as code.
Open Source Code	Yes	Source code is available at https://github.com/Yeh Li/TDEN.
Open Datasets	Yes	We conduct the experiments for pretraining over the large-scale image captioning benchmark Conceptual Captions (Sharma et al. 2018). VQA 2.0 (Antol et al. 2015) is adopted for ﬁnetuning our TDEN, which consists of 1.1 million questions about images in COCO (Chen et al. 2015). We utilize Flickr30k (Plummer et al. 2015) in this task and each image is equipped with ﬁve human-annotated sentences. The Visual Commonsense Reasoning (VCR) benchmark (Zellers et al. 2019) is utilized for evaluation. COCO (Chen et al. 2015) is utilized for ﬁne-tuning and evaluating TDEN.
Dataset Splits	Yes	During ﬁnetuning, we follow the ofﬁcial split (Anderson et al. 2018) and formulate this task as a multi-label classiﬁcation problem. We follow the commonly adopted split in (Lee et al. 2018) and formulate this task as a ranking problem that sorts images according to the image-sentence similarities, which are measured as in ISM. We utilize the widely adopted Karpathy split (Karpathy and Fei-Fei 2015; Yao et al. 2017b, 2018, 2019) for evaluation.
Hardware Specification	Yes	We implement the whole architecture with Py Torch (Paszke et al. 2019), optimized with Adam (Kingma and Ba 2015) on 16 Tesla P40 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number. Other software or libraries are mentioned without versions.
Experiment Setup	Yes	During pretraining, ... The mini-batch size is 1,024 and learning rate is set as 0.0001. The maximum iteration is 10 epoches. Finetuning Data and Details on Downstream Tasks. ... cross-entropy loss (mini-batch size: 96, learning rate: 0.00005, maximum iteration: 20 epoches). ... triplet ranking loss (mini-batch size: 512, learning rate: 0.00002, maximum iteration: 30 epoches). ... cross-entropy loss (mini-batch size: 64, learning rate: 0.00002, maximum iteration: 20 epoches). ... mini-batch size is 16 and the learning rate is 0.00003. We set the maximum iteration as 10 epoches. The learning rate is 0.000005 and the maximum iteration is 30 epoches.