Building Effective Representations for Sketch Recognition

Authors: Jun Guo, Changhu Wang, Hongyang Chao

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed representations are highly discriminative and lead to large improvements over the state of the arts.
Researcher Affiliation Collaboration Jun Guo Sun Yat-sen University, Guangzhou, P.R.China, Changhu Wang Microsoft Research, Beijing, P.R.China, Hongyang Chao Sun Yat-sen University, Guangzhou, P.R.China
Pseudocode No The paper describes its methods in text and uses flow diagrams (Figure 2, Figure 4) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statements about releasing source code or provide links to a code repository for the methodology described.
Open Datasets Yes In this section, we evaluate the proposed representations on the largest sketch dataset collected by Eitz (Eitz, Hays, and Alexa 2012), which contains 20, 000 sketches in 250 categories.
Dataset Splits Yes Following Eitz s evaluation protocol, we partition the dataset into three parts and perform three-fold cross-test: each time two parts are used for training and the remaining part for testing. The mean classification accuracy of three folds is reported.
Hardware Specification Yes Experiments were performed on a laptop equipped with an Intel Core i7.
Software Dependencies No The paper mentions: "We use the Liblinear package (Fan et al. 2008) to learn a linear SVM for classification." However, it does not specify a version number for the Liblinear package.
Experiment Setup Yes The other parameters of Gabor filters, i.e., the scale, the wavelength of the sinusoidal factor, and the spatial aspect ratio are set to 5, 9, 1 respectively. In this work R is set to 9. We sample 32 32 points and utilize square patches with sizes of 64 and 92 plus circular patches with radii of 32 and 46. We use a 4 4 square grid of pooling centers. A circular channel is first divided into 2 distance intervals and further uniformly split into 8 polar sectors. In our work, all one-layer MSCs learn dictionaries of 2000 codewords. For a two-layer MSC, the first layer learns a 1000-codeword dictionary and its output is max-pooled with the group size of 4 4...Then comes the second layer which learns a dictionary of 2000 codewords. The final stage of each MSC applies a three-level Spatial Pyramid Max-Pooling with each level generating 1 1, 2 2 and 3 3 pooled codes respectively.