Enhance Sketch Recognition’s Explainability via Semantic Component-Level Parsing

Authors: Guangming Zhu, Siyuan Wang, Tianci Wu, Liang Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the SPG and Sketch IME datasets demonstrate the memory module s flexibility and the recognition network s explainability.
Researcher Affiliation Academia Guangming Zhu1,2,3, Siyuan Wang1, Tianci Wu1, Liang Zhang1,2,3, 1School of Computer Science and Technology, Xidian University, China 2Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province 3Xi an Key Laboratory of Intelligent Software Engineering
Pseudocode No The paper describes the methodology using prose and mathematical equations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes The code and data are available at https://github.com/Guangming Zhu/Sketch ESC.
Open Datasets Yes The SPG dataset (Li et al. 2018) and Sketch IME dataset (Zhu et al. 2023) are used to verify the advantages of the proposed network.
Dataset Splits No The paper specifies data for training and testing, but does not explicitly mention a validation split or its size for model development or hyperparameter tuning. For SPG: 'An average of 600 samples per category are used for training, while 100 samples for testing.' For Sketch IME: 'An average of 100 samples per sketch category are used for training, while 50 samples for testing.'
Hardware Specification Yes Our network is implemented by Pytorch and trained on a single NVIDIA GTX 3090.
Software Dependencies No The paper mentions 'implemented by Pytorch' and 'Transformer module initialized with the pretrained Vi T-Base model from Hugging Face', but it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The learning rate is initialized to 3 10 4 with a batch size of 128. The Adam optimizer is used. Total 200 epochs are implemented for each training. The τ in Eq. (1) is set to 1. The λ1 and λ2 in Eq. (8) are set to 1 and 20, respectively. The λs in Eq. (6) and the λc in Eq. (7) are set to 10 empirically.