Darwinian Model Upgrades: Model Evolving with Selective Compatibility

Authors: Binjie Zhang, Shupeng Su, Yixiao Ge, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying Shan

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the superiority of DMU through comprehensive experiments on large-scale landmark retrieval and face recognition benchmarks. DMU effectively alleviates new-to-new degradation and improves new-to-old compatibility, rendering a more proper model upgrading paradigm in large-scale retrieval systems.
Researcher Affiliation Collaboration Binjie Zhang1,2,4 , Shupeng Su1 , Yixiao Ge1 , Xuyuan Xu3, Yexin Wang3, Chun Yuan4, Mike Zheng Shou2, Ying Shan1 1ARC Lab, Tencent PCG 2National University of Singapore 3AI Technology Center of Tencent Video 4Tsinghua University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code: https://github.com/TencentARC/OpenCompatible.
Open Datasets Yes We adopt the GLDv2-train-clean version (Weyand et al. 2020) as training set which is comprised of 1,580,470 images in 81,313 landmarks. All models are trained on MS1Mv3 (Deng et al. 2019) dataset, which contains 5,179,510 training images with 93,431 labels.
Dataset Splits No The paper mentions 'GLDv2-train-clean version' as training set and 'GLDv2-test', 'ROxford', 'RParis', and 'IJB-C' for evaluation. It describes different training data split scenarios (e.g., 30%data 100%data) for simulating model upgrades, but it does not provide specific details about a dedicated validation dataset split (e.g., percentages or counts for a validation set).
Hardware Specification Yes With 6 Tesla V100 for training, the batch size per GPU is set as 256 for the embedding model and 512 for the feature upgrade module.
Software Dependencies No The paper mentions 'SGD optimizer' but does not provide specific software dependencies with version numbers, such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes The input image is resized to 224 224 for training and inference. Random image augmentation is applied which includes the random resized cropping and horizontal flipping. With 6 Tesla V100 for training, the batch size per GPU is set as 256 for the embedding model and 512 for the feature upgrade module. SGD optimizer with 0.9 momentum and 10 4 weight decay is adopted. Besides, we uniformly use the cosine lr scheduler with 1 warm-up epoch in the total running of 30 epochs. The initial learning rate is set as 0.1 . For the training objective, we set hyper-parameters in Arc Face loss as s = 30, m = 0.3 in landmark retrieval, s = 64, m = 0.5 in face recognition.