Building Effective Representations for Sketch Recognition
Authors: Jun Guo, Changhu Wang, Hongyang Chao
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed representations are highly discriminative and lead to large improvements over the state of the arts. |
| Researcher Affiliation | Collaboration | Jun Guo Sun Yat-sen University, Guangzhou, P.R.China, Changhu Wang Microsoft Research, Beijing, P.R.China, Hongyang Chao Sun Yat-sen University, Guangzhou, P.R.China |
| Pseudocode | No | The paper describes its methods in text and uses flow diagrams (Figure 2, Figure 4) but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statements about releasing source code or provide links to a code repository for the methodology described. |
| Open Datasets | Yes | In this section, we evaluate the proposed representations on the largest sketch dataset collected by Eitz (Eitz, Hays, and Alexa 2012), which contains 20, 000 sketches in 250 categories. |
| Dataset Splits | Yes | Following Eitz s evaluation protocol, we partition the dataset into three parts and perform three-fold cross-test: each time two parts are used for training and the remaining part for testing. The mean classification accuracy of three folds is reported. |
| Hardware Specification | Yes | Experiments were performed on a laptop equipped with an Intel Core i7. |
| Software Dependencies | No | The paper mentions: "We use the Liblinear package (Fan et al. 2008) to learn a linear SVM for classification." However, it does not specify a version number for the Liblinear package. |
| Experiment Setup | Yes | The other parameters of Gabor filters, i.e., the scale, the wavelength of the sinusoidal factor, and the spatial aspect ratio are set to 5, 9, 1 respectively. In this work R is set to 9. We sample 32 32 points and utilize square patches with sizes of 64 and 92 plus circular patches with radii of 32 and 46. We use a 4 4 square grid of pooling centers. A circular channel is first divided into 2 distance intervals and further uniformly split into 8 polar sectors. In our work, all one-layer MSCs learn dictionaries of 2000 codewords. For a two-layer MSC, the first layer learns a 1000-codeword dictionary and its output is max-pooled with the group size of 4 4...Then comes the second layer which learns a dictionary of 2000 codewords. The final stage of each MSC applies a three-level Spatial Pyramid Max-Pooling with each level generating 1 1, 2 2 and 3 3 pooled codes respectively. |