Learning Structured Output Representation using Deep Conditional Generative Models

Authors: Kihyuk Sohn, Honglak Lee, Xinchen Yan

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we demonstrate the effectiveness of our proposed algorithm in comparison to the deterministic deep neural network counterparts in generating diverse but realistic structured output predictions using stochastic inference. Furthermore, the proposed training methods are complimentary, which leads to strong pixel-level object segmentation and semantic labeling performance on Caltech-UCSD Birds 200 and the subset of Labeled Faces in the Wild dataset. In Section 5, we evaluate our proposed models and report experimental results.
Researcher Affiliation Collaboration Kihyuk Sohn Xinchen Yan Honglak Lee NEC Laboratories America, Inc. University of Michigan, Ann Arbor ksohn@nec-labs.com, {xcyan,honglak}@umich.edu
Pseudocode No The paper presents mathematical formulations and descriptions of the model and training process, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using third-party tools like Mat Conv Net and Adam, but it does not provide an explicit statement or link for the authors' own implementation code or source code for the methodology described.
Open Datasets Yes For the proof of concept, we create an artificial experimental setting for structured output prediction using MNIST database [19]. Then, we evaluate the proposed CVAE models on several benchmark datasets for visual object segmentation and labeling, such as Caltech-UCSD Birds (CUB) [36] and Labeled Faces in the Wild (LFW) [12].
Dataset Splits Yes The training/test split proposed in [36] was used in our experiment, and for validation purpose, we partition the training set into 10 folds and cross-validated with the mean intersection over union (Io U) score over the folds. The final prediction on the test set was made by averaging the posterior from ensemble of 10 networks that are trained on each of the 10 folds separately.
Hardware Specification Yes Mean inference time per image: 2.32 (ms) for CNN and 3.69 (ms) for deep CGMs, measured using Ge Force GTX TITAN X card with Mat Conv Net; we provide more information in the supplementary material.
Software Dependencies No Our implementation is based on Mat Conv Net [33], a MATLAB toolbox for convolutional neural networks, and Adam [14] for adaptive learning rate scheduling algorithm of SGD optimization. While specific software names are mentioned, their version numbers are not provided (e.g., Mat Conv Net version, MATLAB version, Adam version).
Experiment Setup Yes The same network architecture, the MLP with two-layers of 1, 000 Re LUs for recognition, conditional prior, or generation networks, followed by 200 Gaussian latent variables, was used for all the models in various experimental settings. The early stopping is used during the training based on the estimation of the conditional likelihoods on the validation set.