Compressed Self-Attention for Deep Metric Learning

Authors: Ziye Chen, Mingming Gong, Yanwu Xu, Chaohui Wang, Kun Zhang, Bo Du3561-3568

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of CSA via extensive experiments on two metric learning tasks: person re-identification and local descriptor learning. Qualitative and quantitative comparisons with latest methods demonstrate the significance of CSA in this topic.
Researcher Affiliation Academia 1School of Computer Science, Wuhan University 2School of Mathematics and Statistics, University of Melbourne 3Department of Biomedical Informatics, University of Pittsburgh 4Universit e Paris-Est, LIGM (UMR 8049), CNRS, ENPC, ESIEE Paris, UPEM, F-77455 Marne-la-Vall ee, France 5Department of Philosophy, Carnegie Mellon University
Pseudocode No The paper describes the implementation steps of the CSA module in a numbered list within paragraph text, but it is not formatted as pseudocode or a clearly labeled algorithm block with code-like syntax (e.g., 'if', 'for' loops).
Open Source Code No The paper does not provide a specific link to source code or explicitly state that the code for the described methodology is publicly available.
Open Datasets Yes Market-1501 has 12,936 images of 751 identities for training, and has 3,368 query images and 19,732 gallery images of 750 other identities for testing. Duke MTMC-Re ID has 16,522 images of 702 identities for training, and has 2,228 query images and 17,661 gallery images of 702 other identities for testing. CUHK03-NP is re-formulated from the old CUHK03 dataset (Li et al. 2014) with the new training/testing protocol proposed in (Zhong et al. 2017). It contains 7,368 images of 767 identities for training and 5,328 images of 700 other identities for testing. Brown dataset (Brown and Lowe 2007) consists of three subsets: Liberty, Notredame, and Yosemite with about 400k patches in each subset. Hpatches dataset (Balntas et al. 2017) consists of 116 sequences of 6 images.
Dataset Splits No The paper provides detailed training and testing split information for the datasets but does not explicitly mention or quantify a separate validation dataset split with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions general software components like 'Res Net50' and 'SGD' but does not specify version numbers for any programming languages, libraries, or frameworks used (e.g., Python 3.x, PyTorch 1.x, CUDA 10.x).
Experiment Setup Yes The input images are resized to 384 128 and augmented with random horizontal flipping. We use Res Net50 with the pre-trained weights from Image Net as the backbone. Optimization is done by SGD with momentum of 0.9 and weight decay of 0.0001. The batch size is set to 64. We train the model for 100 epochs. The base learning rate is initialized at 0.1 and multiplied by 0.1 after every 40 epochs. The learning rate for the backbone is set to 0.1 times the base learning rate. For the hyper-parameters of CSA, we set the number of base attention maps in each group to 32 and the number of groups to 2. The balance factor λ is set to 1.0. For local descriptor learning, the input images are resized to 32 32, and per-batch normalized. We apply data augmentation by random flipping and 90 rotation. We train the model for 10 epochs with learning rate initialized at 10 and linearly decayed to 0.