Improved Multimodal Deep Learning with Variation of Information

Authors: Kihyuk Sohn, Wenling Shang, Honglak Lee

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we demonstrate the state-of-the-art visual recognition performance on MIR-Flickr database and PASCAL VOC 2007 database with and without text features.
Researcher Affiliation Academia Kihyuk Sohn, Wenling Shang and Honglak Lee University of Michigan Ann Arbor, MI, USA {kihyuks,shangw,honglak}@umich.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit statements or links for the availability of its own source code. Footnote 6 links to external data/features, not the authors' implementation.
Open Datasets Yes In our first experiment, we evaluate the proposed learning algorithm on the MNIST handwritten digit recognition dataset [16].; In this section, we evaluate our methods on MIR-Flickr database [11], which is composed of 1 million examples of image and their user tags collected from the social photo-sharing website Flickr.5; We evaluate the proposed algorithm on PASCAL VOC 2007 database. The original dataset doesn t contain user tags, but Guillaumin et al. [7] has collected the user tags from Flickr website.9
Dataset Splits Yes Following the experimental protocol [12, 27], we randomly split the labeled data into 15000 for training and 10000 for testing, and used 5000 from training set for validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, or specific software environments used for the experiments.
Experiment Setup Yes We have two additional hyper parameters, the number of mean-field updates and the sampling ratio of a subset s to be predicted from the target data modality. In our experiments, it was sufficient to use 10 20 iterations until convergence. We used the sampling ratio of 1 (i.e., all the variables in the target data modality are to be predicted) since we are already conditioned on one data modality, which is sufficient to make a good prediction of variables in the target data modality.