FILIP: Fine-grained Interactive Language-Image Pre-Training
Authors: Lewei Yao, Runhui Huang, Lu Hou, Guansong Lu, Minzhe Niu, Hang Xu, Xiaodan Liang, Zhenguo Li, Xin Jiang, Chunjing Xu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS and Experiments show that FILIP achieves state-of-the-art performance on multiple downstream vision-language tasks including zero-shot image classification and image-text retrieval. |
| Researcher Affiliation | Collaboration | 1Huawei Noah s Ark Lab, 2Hong Kong University of Science and Technology 3Sun Yat-sen University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using 'the LAMB optimizer implemented by the cybertronai s open-source repository (https: //github.com/cybertronai/pytorch-lamb)' but does not state that the code for FILIP itself is open-source or provide a link to it. |
| Open Datasets | Yes | We also use 3 public datasets, including Conceptual Captions 3M (CC3M) (Sharma et al., 2018), Conceptual 12M (CC12M) (Changpinyo et al., 2021) and Yahoo Flickr Creative Commons 100M (YFCC100M) (Thomee et al., 2016). |
| Dataset Splits | No | The paper describes training and test sets for evaluation but does not explicitly provide details about a dedicated validation dataset split for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | The training is mainly conducted on Nvidia V100 GPUs and Ascend Cards. |
| Software Dependencies | No | The paper mentions software like 'LAMB optimizer', 'scikit-learn', and 'pytorch-based codebase', but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Table 8 summarizes the common hyperparameters and Table 9 shows the model- and dataset-specific hyperparameters for FILIP pre-training. Table 10 shows the hyperparameters for image-text retrieval fine-tuning. Table 13 shows the hyperparameters used in linear probe on Image Net. |