A Framework to Coordinate Segmentation and Recognition

Authors: Wei Huang, Huimin Yu, Weiwei Zheng, Jing Zhang8473-8480

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate the effectiveness of our framework and model in collaboratively segmenting and recognizing objects that can be recognized using their shapes/shape-patterns.
Researcher Affiliation Academia Wei Huang,1 Huimin Yu,1,2 Weiwei Zheng,1 Jing Zhang1 1College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China 2The State Key Laboratory of CAD and CG, Zhejiang University, Hangzhou, China, {huangwayne28, yhm2005, 3090102748, zj9301}@zju.edu.cn
Pseudocode Yes Algorithm 1: Optimize Eq.8
Open Source Code No The paper does not provide explicit statements or links indicating that the source code for the methodology is openly available.
Open Datasets Yes The second experiment is conducted on (Memo, Minto, and Zanuttigh 2015) hand dataset.
Dataset Splits No The paper specifies training and testing splits, but does not explicitly mention a separate validation set. For the signs and logos dataset: "1080 instances are randomly selected for training, while the other 356 for testing." For the hand gestures dataset: "Of all 1320 images, 920 are chosen randomly for training, and the rest 400 for testing."
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes The structure of the capsule network in this experiment is similar to the one in (Sabour, Frosst, and Hinton 2017): the input is of size 80x80, followed by a standard conv layer with 256 channels of kernel size 11x11 and stride 3, and then 16 types of 16D convolution capsules with kernel size 9x9 and stride 2, finally fully connected with 30 types of 24D capsules, each representing a category (Category Caps). The decoder is an MLP of layer size [512, 1024, 6400], which takes the output of Category Caps as the input. ... We set α = 0.7, β = 1.15 and ρ = 1.1. For all the test images, the optimization runs for 100 iterations.