Sparse Deep Stacking Network for Image Classification
Authors: Jun Li, Heyou Chang, Jian Yang
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we evaluate S-DSN with four databases, including Extended Yale B, AR, 15 scene and Caltech101. Experimental results show that our model outperforms related classification methods with only a linear classifier. It is worth noting that we reach 98.8% recognition accuracy on 15 scene. |
| Researcher Affiliation | Academia | Jun Li, Heyou Chang, Jian Yang School of Computer Science and Technology Nanjing University of Science and Technology, Nanjing, 219000, China. |
| Pseudocode | Yes | Algorithm 1 Training Algorithm of Sparse Modular, Algorithm 2 Training Algorithm of S-DSN |
| Open Source Code | No | The paper provides a link for preprocessed data of comparative methods (LC-KSVD) but not for the source code of the proposed S-DSN method: "1they can be downloaded from: http://www.umiacs.umd.edu/∼zhuolin/projectlcksvd.html" |
| Open Datasets | Yes | We present experimental results on four databases: the Extended Yale B database, the AR face database, Caltech101 and 15 scene categories. According to (Jiang, Lin, and Davis 2013), the four databases are preprocessed 1: in the Extended Yale B database and AR face database, each face image is projected onto a n-dimensional feature vector with a randomly generated matrix from a zero-mean normal distribution. The dimension of a random-face feature in Extended Yale B is 504 while the dimension in AR face is 540. In face databases the n-dimensional features of each image are normalized to [-1, 1]. For the Caltech101 database, we first extract sift descriptors from 16x16 patches, which are densely sampled from each image on a dense grid with 6-pixels stepsize; then we extract the spatial pyramid feature based on the extracted sift features with three grids of size 1x1, 2x2 and 4x4. To train the codebook for spatial pyramid, we use the standard k-means clustering with k = 1024. For the 15 scene category database, we compute the spatial pyramid feature using a four-level spatial pyramid and a SIFT-descriptor codebook with a size of 200. Finally, the spatial pyramid features are reduced to 3000 dimensions by PCA. |
| Dataset Splits | Yes | Extended Yale B: We randomly select half (32) of the images per category as training and the other half for testing. AR: For each person, we randomly select 20 images for training and the other 6 for testing. 15 Scene Category: Following the common experimental settings, we randomly choose 100 images from each class for training data and the rest for test data. Caltech101: Following the common experimental settings, we train on 5, 10, 15, 20, 25, and 30 samples per category and test on the rest. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not list any software dependencies with specific version numbers. |
| Experiment Setup | Yes | The matrix parameters are initialized with small random values sampled from a normal distribution with zero mean and standard deviation of 0.01. For simplicity, we use the constant learning rate ϵ chosen from {20, 15, 5, 2, 1, 0.2, 0.1, 0.05, 0.01, 0.001}, the regularization parameter α chosen from {1, 0.5, 0.1}, the sparse regularization parameter β chosen from {0.1, 0.05, 0.01, 0.001, 0.0001} and the group number G chosen from {2, 4, 5, 10, 20}. In all experiments, we only train 5 epochs, the number of hidden units is 500 and the number of layers is 2. For each data set, each experiment is repeated 10 times with random selected training and testing images, and the average precision is reported. |