reproducibilityindex.ai

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training

Authors: Chaoya Jiang, Wei Ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results demonstrate that Ti Mix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods.
Researcher Affiliation	Collaboration	1National Engineering Research Center for Software Engineering, Peking University, Beijing, China 2Alibaba Group, Hangzhou, China
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available on https://github.com/chaoyajiang/Ti Mi X/tree/main.
Open Datasets	Yes	Following the previous works (Li et al. 2021) and (Li et al. 2022a), we use the same pre-training dataset with 14M images with texts, which includes two in-domain datasets (MS COCO (Lin et al. 2014) and Visual Genome (Krishna et al. 2016)), and three web out-domain datasets (Conceptual Captions (Sharma et al. 2018a), Conceptual 12M (Changpinyo et al. 2021a), SBU Captions (Ordonez, Kulkarni, and Berg 2011)).
Dataset Splits	Yes	We evaluated our models by submitting the results to the evaluation server 1 and report the test-dev and test-std scores in Table 1. The fine-tuning hyper-parameters and the details of downstream tasks are described in Appendix D. Tables 1, 2, and 3 use standard splits like 'dev', 'test-dev', 'test-std', and 'COCO Karpathy test split'.
Hardware Specification	Yes	on 8 80G A100
Software Dependencies	No	The paper mentions specific models and loss functions, but does not provide version numbers for any software dependencies like programming languages, frameworks, or libraries.
Experiment Setup	No	The paper states that 'The fine-tuning hyper-parameters and the details of downstream tasks are described in Appendix D' and 'Please refer to Appendix C to see more detail about the pre-training dataset and pre-training setting.' However, these specific details are not present in the main body of the text provided.