Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval

Authors: Binjie Zhang, Yixiao Ge, Yantao Shen, Yu Li, Chun Yuan, XUYUAN XU, Yexin Wang, Ying Shan

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on large-scale retrieval benchmarks (e.g., Google Landmark) demonstrate that our RACT effectively alleviates the model regression for one more step towards seamless model upgrades .
Researcher Affiliation Collaboration Binjie Zhang1,2 Yixiao Ge2 Yantao Shen4 Yu Li5 Chun Yuan1 Xuyuan Xu3 Yexin Wang3 Ying Shan2 1Tsinghua University 2ARC Lab, Tencent PCG 3AI Technology Center of Tencent Video 4AWS/Amazon AI 5International Digital Economy Academy
Pseudocode Yes We demonstrate the core of the algorithm in the form of pseudo codes, as shown in Alg. 1 and 2.
Open Source Code Yes The code is available at https://github.com/binjiezhang/RACT_ICLR2022.
Open Datasets Yes Training Data. Google Landmark v2 (Weyand et al., 2020), GLDv2 in short, is a large-scale public dataset for landmark retrieval. [...] The training data details can be found in Table 1. Table 1: Three different allocations for the training data, where all the images are sampled from GLDv2-clean.
Dataset Splits No The paper describes 'Training Data' and 'Testing Data' and provides detailed breakdowns for these in Table 1 and section 5.1, but it does not explicitly define or specify a separate 'validation' dataset split for hyperparameter tuning or early stopping.
Hardware Specification Yes We adopt 6 Tesla V100 GPUs for training
Software Dependencies No The paper mentions "Adam optimizer" and refers to a "Py Torch-like style" for pseudocode, but it does not specify version numbers for PyTorch, Python, or any other software dependencies needed for reproducibility.
Experiment Setup Yes The images are resized to 224x224 for both training and testing. During training, random data augmentation is applied to each image before it is fed into the network, including randomly flipping and cropping. [...] the batch size per GPU is set to 80 for Res Net-50 and 64 for Res Net-101. For all the experiments, Adam optimizer is adopted to optimize the training model with a weight decay of 0.0001. The initial learning rate is set to 0.01 and is decreased to 1/10 of its previous value every 30 epochs in the total 90 epochs. [...] The temperature τ in Eq. (6) is empirically set as 0.05, and the loss weight λ in Eq. (8) is set as 1.0.