Learning Dense Correspondences between Photos and Sketches
Authors: Xuanchen Lu, Xiaolong Wang, Judith E Fan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we empirically evaluate our method and compare it to existing approaches in dense correspondence learning on the photo-sketch correspondence benchmark. We analyze the difference between human annotations and predictions from existing methods. We show that our method establishes the state-of-the-art in the photo-sketch correspondence benchmark and learns a more human-like representation from the photo-sketch contrastive learning objectives. We conducted additional experiments to evaluate generalization to unseen categories in Appendix C. |
| Researcher Affiliation | Academia | 1University of California, San Diego 2Stanford University. Correspondence to: Judith Fan <jefan@stanford.edu>. |
| Pseudocode | No | The paper describes algorithmic steps and functions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We will publicly release PSC6k with extensive documentation and code to enhance its usability to the research community. Project page: https://photo-sketch-correspondence. github.io |
| Open Datasets | Yes | first, we establish a new benchmark for photo-sketch dense correspondence learning: PSC6k. This benchmark consists of 150,000 pairs of keypoint annotations for 6250 photo-sketch pairs spanning 125 object categories. ...We will publicly release PSC6k with extensive documentation and code to enhance its usability to the research community. |
| Dataset Splits | No | Since there is no validation split, we do not select the best checkpoint and evaluate with the last checkpoint after training. |
| Hardware Specification | No | The paper mentions "the native mixed precision from PyTorch" which implies GPU usage, but no specific GPU models, CPU details, or other hardware specifications are provided. |
| Software Dependencies | No | The paper mentions "PyTorch" as a software component but does not specify its version number. No other software dependencies are listed with specific versions. |
| Experiment Setup | Yes | The encoder is initialized with pretrained weights from MoCo training (He et al., 2020) on ImageNet-2012 (Deng et al., 2009). We then train our encoder on the training split of Sketchy for 1300 epochs. ...with dim = 128, m = 0.999, t = 0.07, lr = 0.03 and a two-layer MLP head. Noticeably, we set the size of the memory queue to K = 8192...We then train the estimator for 1200 epochs with a learning rate of 0.003, leading to 2500 epochs of training in total. We set the weights of the objectives to λsim = 0.1, λcon = 1.0. We compute Lsim using the features after ResNet stages 2 and 3, and the temperature is set to τ = 0.001. ...We train the network with the SGD optimizer, a weight decay of 1e-4, a batch size of 256, and the native mixed precision from PyTorch. We adopt a cosine learning rate decay schedule (Loshchilov & Hutter, 2016). |