Audio Visual Attribute Discovery for Fine-Grained Object Recognition

Authors: Hua Zhang, Xiaochun Cao, Rui Wang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results demonstrate that with the help of audio visual attribute, we achieve the superior or comparable performance to that of strongly supervised approaches on the bird recognition. Experiments are conducted on the fine-grained benchmark CUB200-211.
Researcher Affiliation Academia Hua Zhang, Xiaochun Cao, Rui Wang State Key Laboratory of Information Security (SKLOIS), Institute of Information Engineering, CAS, Beijing, China zhanghua@iie.ac.cn, caoxiaochun@iie.ac.cn , wangrui@iie.ac.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions implementing their architecture based on 'the open-source package torch7,' but it does not state that their own implementation code is open source or provide a link to it.
Open Datasets Yes Image set Caltech-UCSD Birds dataset (CUB-200-2011) (Wah et al. 2011) is the widely used fine-grained classification benchmark.
Dataset Splits Yes Specifically, the bird benchmark is divided into training, validation, and test part. The train and test samples are selected following (Wah et al. 2011). While for the validation set, we randomly choose 10% samples from the training set.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies No The paper mentions using 'the open-source package torch7' but does not specify its version number or list other software dependencies with their versions.
Experiment Setup Yes Our networks are trained by stochastic gradient descent with 0.9 momentum. We initiate learning rate to be 0.0001 and decrease it by 0.1 after finishing about 20 epochs. The weight decay parameter is 0.0005.