K-LITE: Learning Transferable Visual Models with External Knowledge

Authors: Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods.
Researcher Affiliation Collaboration Microsoft \University of California, Berkeley
Pseudocode No The paper describes methods and processes but does not include any formally structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/microsoft/klite.
Open Datasets Yes We pre-train on Image Net-21K [15] and GCC [76, 11]/YFCC [84] datasets... Following GLIP [50], we pre-train on Object365 [75]...
Dataset Splits No The paper mentions training and testing datasets (Table 1) and evaluates performance metrics but does not explicitly specify validation dataset splits or their sizes in the main text.
Hardware Specification No The main text states that hardware specifications are provided in the Appendix, which is not available for analysis.
Software Dependencies No The paper mentions 'Spacy [32]' without a version number, and refers to other models/frameworks like CLIP, ALIGN, Uni CL, and GLIP, but does not provide specific version numbers for software dependencies or libraries used for implementation.
Experiment Setup No The main text states that training details including hyperparameters are provided in the Appendix, which is not available for analysis.