TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
Authors: Chaoya Jiang, Wei Ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results demonstrate that Ti Mix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods. |
| Researcher Affiliation | Collaboration | 1National Engineering Research Center for Software Engineering, Peking University, Beijing, China 2Alibaba Group, Hangzhou, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available on https://github.com/chaoyajiang/Ti Mi X/tree/main. |
| Open Datasets | Yes | Following the previous works (Li et al. 2021) and (Li et al. 2022a), we use the same pre-training dataset with 14M images with texts, which includes two in-domain datasets (MS COCO (Lin et al. 2014) and Visual Genome (Krishna et al. 2016)), and three web out-domain datasets (Conceptual Captions (Sharma et al. 2018a), Conceptual 12M (Changpinyo et al. 2021a), SBU Captions (Ordonez, Kulkarni, and Berg 2011)). |
| Dataset Splits | Yes | We evaluated our models by submitting the results to the evaluation server 1 and report the test-dev and test-std scores in Table 1. The fine-tuning hyper-parameters and the details of downstream tasks are described in Appendix D. Tables 1, 2, and 3 use standard splits like 'dev', 'test-dev', 'test-std', and 'COCO Karpathy test split'. |
| Hardware Specification | Yes | on 8 80G A100 |
| Software Dependencies | No | The paper mentions specific models and loss functions, but does not provide version numbers for any software dependencies like programming languages, frameworks, or libraries. |
| Experiment Setup | No | The paper states that 'The fine-tuning hyper-parameters and the details of downstream tasks are described in Appendix D' and 'Please refer to Appendix C to see more detail about the pre-training dataset and pre-training setting.' However, these specific details are not present in the main body of the text provided. |