StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

Authors: Xiangteng He, Yuxin Peng, Junjie Zhao

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comparing with ten state-of-the-art methods on CUB-2002011 dataset, our Stack DRL approach achieves the best categorization accuracy. In this section, we present comprehensive experimental results and analyses of our Stack DRL approach on CUB-2002011 dataset [Wah et al., 2011], and adopt Top-1 accuracy to evaluate its effectiveness.
Researcher Affiliation Academia Xiangteng He, Yuxin Peng , and Junjie Zhao Institute of Computer Science and Technology, Peking University Beijing 100871, China pengyuxin@pku.edu.cn
Pseudocode No The paper describes the proposed method through textual explanations and diagrams (Figure 2, Figure 3, Figure 4) but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to a code repository for the methodology described.
Open Datasets Yes In this section, we present comprehensive experimental results and analyses of our Stack DRL approach on CUB-2002011 dataset [Wah et al., 2011].
Dataset Splits No The paper mentions using the CUB-200-2011 dataset and discusses 'training phase' and 'testing phases' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software components like 'VGGNet' and 'Deep Q-network' but does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup Yes (1) For actions, the ratios of scaling action and local translation actions are set to 0.9 and 0.1 respectively. The maximal action execution number Nstep is set to 10, and the level of tree structure Nlevel is set to 4. (2) For reward function, the trigger reward η and threshold τ are set to 3 and 0.5 respectively. (3) For Q-learning... The region features are computed via Ro I Pooling layer with the size of 512 7 7... mean squared error (MSE) is used... initialize the parameters... from a zero-mean normal distribution with a standard deviation 0.01. In the training phase, the parameter ϵ starts with 1.0 and decreases by 0.1 for each epoch. It is finally fixed to 0.1 after the first 10 epochs...