Improved Multimodal Deep Learning with Variation of Information
Authors: Kihyuk Sohn, Wenling Shang, Honglak Lee
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we demonstrate the state-of-the-art visual recognition performance on MIR-Flickr database and PASCAL VOC 2007 database with and without text features. |
| Researcher Affiliation | Academia | Kihyuk Sohn, Wenling Shang and Honglak Lee University of Michigan Ann Arbor, MI, USA {kihyuks,shangw,honglak}@umich.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit statements or links for the availability of its own source code. Footnote 6 links to external data/features, not the authors' implementation. |
| Open Datasets | Yes | In our first experiment, we evaluate the proposed learning algorithm on the MNIST handwritten digit recognition dataset [16].; In this section, we evaluate our methods on MIR-Flickr database [11], which is composed of 1 million examples of image and their user tags collected from the social photo-sharing website Flickr.5; We evaluate the proposed algorithm on PASCAL VOC 2007 database. The original dataset doesn t contain user tags, but Guillaumin et al. [7] has collected the user tags from Flickr website.9 |
| Dataset Splits | Yes | Following the experimental protocol [12, 27], we randomly split the labeled data into 15000 for training and 10000 for testing, and used 5000 from training set for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, or specific software environments used for the experiments. |
| Experiment Setup | Yes | We have two additional hyper parameters, the number of mean-field updates and the sampling ratio of a subset s to be predicted from the target data modality. In our experiments, it was sufficient to use 10 20 iterations until convergence. We used the sampling ratio of 1 (i.e., all the variables in the target data modality are to be predicted) since we are already conditioned on one data modality, which is sufficient to make a good prediction of variables in the target data modality. |