Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition
Authors: Yanhua Cheng, Xin Zhao, Rui Cai, Zhiwei Li, Kaiqi Huang, Yong Rui
IJCAI 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods. |
| Researcher Affiliation | Collaboration | Yanhua Cheng1 , Xin Zhao1, Rui Cai2, Zhiwei Li2, Kaiqi Huang1,3, Yong Rui2 1CRIPAC&NLPR, CASIA 2Microsoft Research 3CAS Center for Excellence in Brain Science and Intelligence Technology EMAIL, EMAIL |
| Pseudocode | No | The paper describes the algorithms in prose and uses diagrams (Fig. 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We perform our experiments on the Washington RGB-D dataset [Lai et al., 2011a] captured by Microsoft Kinect. |
| Dataset Splits | Yes | To evaluate our semi-supervised learning, we first utilize one of the 10 random splits provided by [Lai et al., 2011a] to divide the dataset into a training set and a testing set. For any split, there are around 35,000 examples for training and around 6,877 for testing. Then we randomly labeled 5% samples (around 1750) of the training set, and remain the rest unlabeled (around 33,250). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimization algorithms (SGD) and architectures (AlexNet) but does not provide specific software dependencies or library version numbers used in the implementation. |
| Experiment Setup | Yes | We fix = 0.5, K = 20, β = 1 for our semi-supervised learning method, although dynamically finetuning each parameter could result in a better performance. For the reconstruction network of each modality, we use a mini-batch b = 128 of images and initial learning rate = 10 5, multiplying the learning rate by 0.1 at every s = 4000 iterations. Towards the training of the RGBand depth-DCNN models for recognition during every iteration, we set b = 128, = 10 7, and s = 3000. |