reproducibilityindex.ai

Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition

Authors: Yanhua Cheng, Xin Zhao, Rui Cai, Zhiwei Li, Kaiqi Huang, Yong Rui

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods.
Researcher Affiliation	Collaboration	Yanhua Cheng1 , Xin Zhao1, Rui Cai2, Zhiwei Li2, Kaiqi Huang1,3, Yong Rui2 1CRIPAC&NLPR, CASIA 2Microsoft Research 3CAS Center for Excellence in Brain Science and Intelligence Technology {yh.cheng, xzhao, kaiqi.huang}@nlpr.ia.cn, {ruicai, zli, yongrui}@microsoft.com
Pseudocode	No	The paper describes the algorithms in prose and uses diagrams (Fig. 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We perform our experiments on the Washington RGB-D dataset [Lai et al., 2011a] captured by Microsoft Kinect.
Dataset Splits	Yes	To evaluate our semi-supervised learning, we ﬁrst utilize one of the 10 random splits provided by [Lai et al., 2011a] to divide the dataset into a training set and a testing set. For any split, there are around 35,000 examples for training and around 6,877 for testing. Then we randomly labeled 5% samples (around 1750) of the training set, and remain the rest unlabeled (around 33,250).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions optimization algorithms (SGD) and architectures (AlexNet) but does not provide specific software dependencies or library version numbers used in the implementation.
Experiment Setup	Yes	We fix = 0.5, K = 20, β = 1 for our semi-supervised learning method, although dynamically ﬁnetuning each parameter could result in a better performance. For the reconstruction network of each modality, we use a mini-batch b = 128 of images and initial learning rate = 10 5, multiplying the learning rate by 0.1 at every s = 4000 iterations. Towards the training of the RGBand depth-DCNN models for recognition during every iteration, we set b = 128, = 10 7, and s = 3000.