Learning Dense Correspondences between Photos and Sketches

Authors: Xuanchen Lu, Xiaolong Wang, Judith E Fan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we empirically evaluate our method and compare it to existing approaches in dense correspondence learning on the photo-sketch correspondence benchmark. We analyze the difference between human annotations and predictions from existing methods. We show that our method establishes the state-of-the-art in the photo-sketch correspondence benchmark and learns a more human-like representation from the photo-sketch contrastive learning objectives. We conducted additional experiments to evaluate generalization to unseen categories in Appendix C.
Researcher Affiliation Academia 1University of California, San Diego 2Stanford University. Correspondence to: Judith Fan <jefan@stanford.edu>.
Pseudocode No The paper describes algorithmic steps and functions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We will publicly release PSC6k with extensive documentation and code to enhance its usability to the research community. Project page: https://photo-sketch-correspondence. github.io
Open Datasets Yes first, we establish a new benchmark for photo-sketch dense correspondence learning: PSC6k. This benchmark consists of 150,000 pairs of keypoint annotations for 6250 photo-sketch pairs spanning 125 object categories. ...We will publicly release PSC6k with extensive documentation and code to enhance its usability to the research community.
Dataset Splits No Since there is no validation split, we do not select the best checkpoint and evaluate with the last checkpoint after training.
Hardware Specification No The paper mentions "the native mixed precision from PyTorch" which implies GPU usage, but no specific GPU models, CPU details, or other hardware specifications are provided.
Software Dependencies No The paper mentions "PyTorch" as a software component but does not specify its version number. No other software dependencies are listed with specific versions.
Experiment Setup Yes The encoder is initialized with pretrained weights from MoCo training (He et al., 2020) on ImageNet-2012 (Deng et al., 2009). We then train our encoder on the training split of Sketchy for 1300 epochs. ...with dim = 128, m = 0.999, t = 0.07, lr = 0.03 and a two-layer MLP head. Noticeably, we set the size of the memory queue to K = 8192...We then train the estimator for 1200 epochs with a learning rate of 0.003, leading to 2500 epochs of training in total. We set the weights of the objectives to λsim = 0.1, λcon = 1.0. We compute Lsim using the features after ResNet stages 2 and 3, and the temperature is set to τ = 0.001. ...We train the network with the SGD optimizer, a weight decay of 1e-4, a batch size of 256, and the native mixed precision from PyTorch. We adopt a cosine learning rate decay schedule (Loshchilov & Hutter, 2016).