Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition
Authors: Yanhua Cheng, Xin Zhao, Rui Cai, Zhiwei Li, Kaiqi Huang, Yong Rui
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods. |
| Researcher Affiliation | Collaboration | Yanhua Cheng1 , Xin Zhao1, Rui Cai2, Zhiwei Li2, Kaiqi Huang1,3, Yong Rui2 1CRIPAC&NLPR, CASIA 2Microsoft Research 3CAS Center for Excellence in Brain Science and Intelligence Technology {yh.cheng, xzhao, kaiqi.huang}@nlpr.ia.cn, {ruicai, zli, yongrui}@microsoft.com |
| Pseudocode | No | The paper describes the algorithms in prose and uses diagrams (Fig. 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We perform our experiments on the Washington RGB-D dataset [Lai et al., 2011a] captured by Microsoft Kinect. |
| Dataset Splits | Yes | To evaluate our semi-supervised learning, we first utilize one of the 10 random splits provided by [Lai et al., 2011a] to divide the dataset into a training set and a testing set. For any split, there are around 35,000 examples for training and around 6,877 for testing. Then we randomly labeled 5% samples (around 1750) of the training set, and remain the rest unlabeled (around 33,250). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimization algorithms (SGD) and architectures (AlexNet) but does not provide specific software dependencies or library version numbers used in the implementation. |
| Experiment Setup | Yes | We fix = 0.5, K = 20, β = 1 for our semi-supervised learning method, although dynamically finetuning each parameter could result in a better performance. For the reconstruction network of each modality, we use a mini-batch b = 128 of images and initial learning rate = 10 5, multiplying the learning rate by 0.1 at every s = 4000 iterations. Towards the training of the RGBand depth-DCNN models for recognition during every iteration, we set b = 128, = 10 7, and s = 3000. |