reproducibilityindex.ai

Unicom: Universal and Compact Representation Learning for Image Retrieval

Authors: Xiang An, Jiankang Deng, Kaicheng Yang, Jaiwei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method significantly outperforms stateof-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks. The code and pre-trained models are released to facilitate future research https://github.com/deepglint/unicom. 4 EXPERIMENTS
Researcher Affiliation	Collaboration	Xiang An1, Jiankang Deng2 , Kaicheng Yang1, Jiawei Li1, Ziyong Feng1, Jia Guo3, Jing Yang4, Tongliang Liu5 1Deep Glint, 2Huawei, 3Insight Face, 4University of Cambridge, 5University of Sydney
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The code and pre-trained models are released to facilitate future research https://github.com/deepglint/unicom.
Open Datasets	Yes	We first cluster the large-scale LAION 400M dataset into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model. Table 10: List of linear probe datasets with the data distribution and evaluation metrics. Table 11: Dataset composition for training and evaluation in the image retrieval task.
Dataset Splits	Yes	For supervised retrieval, we follow the data-split settings of the baseline methods (Patel et al., 2022; Ermolov et al., 2022) to fine-tune models. Table 10: List of linear probe datasets with the data distribution and evaluation metrics. Table 11: Dataset composition for training and evaluation in the image retrieval task.
Hardware Specification	Yes	The training is conducted on 128 NVIDIA V100 GPUs across 16 nodes.
Software Dependencies	No	The paper mentions 'Adam W (Loshchilov & Hutter, 2018) as the optimizer' and 'Arc Face (Deng et al., 2019; 2020) for both pre-training and image retrieval tasks' but does not provide specific version numbers for these or other software components.
Experiment Setup	Yes	Unless otherwise specified, all Vi T models in our experiments follow the same architecture designs in CLIP, and are trained from scratch for 32 epochs on the automatically clustered LAION 400M dataset (Section 3.2) with cluster number k = 1M. During training, we randomly crop and horizontally flip each image to get the input image with 224 224 resolution. We set the random class sampling ratio r1 as 0.1 in the pre-training step. We use Adam W (Loshchilov & Hutter, 2018) as the optimizer with an initial learning rate of 0.001, and a weight decay of 0.05. We employ margin-based softmax loss, Arc Face (Deng et al., 2019; 2020), for both pre-training and image retrieval tasks. The margin value is set to 0.3 and the feature scale is set to 64.