Deep Semantic Dictionary Learning for Multi-label Image Classification
Authors: Fengtao Zhou, Sheng Huang, Yun Xing3572-3580
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on three popular benchmarks demonstrate that our method achieves promising performances in comparison with the state-of-the-arts. |
| Researcher Affiliation | Academia | Fengtao Zhou1, Sheng Huang1,2,*, Yun Xing1 1School of Big data & Software Engineering, Chongqing University, 2Ministry of Education Key Laboratory of Dependable Service Computing in Cyber Physical Society, Shazheng street NO.174, Shapingba District, Chongqing 400044, China {zft, huangsheng, yxing}@cqu.edu.cn |
| Pseudocode | Yes | Algorithm 1: Alternately Parameter Updating |
| Open Source Code | Yes | Our codes and models have been released1. 1https://github.com/ZFT-CQU/DSDL. |
| Open Datasets | Yes | Datasets: To prove the effectiveness of DSDL, we conduct extensive experiments on three public multi-label image benchmarks are used for model evaluation, i.e., Pascal VOC 2007, Pascal VOC 2012 and Microsoft COCO. Pascal VOC 2007 (Everingham et al. 2010)... Pascal VOC 2012 (Everingham et al. 2010)... Microsoft COCO (Lin et al. 2014)... |
| Dataset Splits | Yes | Pascal VOC 2007 (Everingham et al. 2010) is the most widely used datasets to evaluate the multi-label image classification task, which contains 20 categories divided into training (2,501), validation (2,510) and testing (4,952). Following the previous work, we train our model on training and validation sets, and evaluate on the testing set. Pascal VOC 2012 (Everingham et al. 2010) consists of the same 20 categories as VOC 2007 while VOC 2012 contains images from 2008-2011 and there is no intersection with 2007. VOC 2012 dataset is divided into training (5,717), validation (5,823) and testing (10,991). We train our model on training set, and fine-tune on validation set. Microsoft COCO (Lin et al. 2014)... which is further divided into 82,081 images training set and 40,137 images validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions tools like 'Res Net-101' and 'Glo Ve' and 'Stochastic Gradient Descent (SGD) optimizer' but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | Implementation Details: Res Net-101 pre-trained on Image Net is utilized as the feature learning module. The input images are randomly cropped and resized into 448 448 with random horizontal flips for data augmentation. With regard to the dictionary learning module, the encoder consists of two fully connected layers with output dimension of 1024 and 2048 followed by Leaky Re LU with negative slope 0.2. The decoder shares the same learnable parameters with encoder. All modules are optimized with Stochastic Gradient Descent (SGD) optimizer. The momentum is 0.9 and the weight decay is 10 4. The initial learning rate is 0.01, which decays by a factor of 10 for every 40 epochs and the network is trained for 100 epochs in total. |