Max-Margin Deep Generative Models

Authors: Chongxuan Li, Jun Zhu, Tianlin Shi, Bo Zhang

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on MNIST and SVHN datasets demonstrate that (1) maxmargin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; and (2) mm DGMs are competitive to the state-of-the-art fully discriminative networks by employing deep convolutional neural networks (CNNs) as both recognition and generative models.
Researcher Affiliation Academia Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, Center for Bio-Inspired Computing Research, Tsinghua University, Beijing, 100084, China Dept. of Comp. Sci., Stanford University, Stanford, CA 94305, USA {licx14@mails., dcszj@, dcszb@}tsinghua.edu.cn; stl501@gmail.com
Pseudocode Yes Algorithm 1 Doubly Stochastic Subgradient Algorithm
Open Source Code Yes 1The source code is available at https://github.com/zhenxuan00/mmdgm.
Open Datasets Yes We now present experimental results on the widely adopted MNIST [14] and SVHN [22] datasets.
Dataset Splits Yes MNIST... which consists of images of 10 different classes (0 to 9) of size 28 28 with 50,000 training samples, 10,000 validating samples and 10,000 testing samples. SVHN [22] is a large dataset consisting of color images of size 32 32. The task is to recognize center digits in natural scene images, which is significantly harder than classification of hand-written digits. We follow the work [27, 8] to split the dataset into 598,388 training data, 6000 validating data and 26, 032 testing data and preprocess the data by Local Contrast Normalization (LCN).
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments.
Software Dependencies No We implement all experiments based on Theano [2]. This mentions a software component but does not specify its version number, nor does it list multiple software components with versions.
Experiment Setup Yes We choose C = 15 for MMVA... In the CNN case, we use 60,000 training data. Table 2 shows the effect of C on classification error rate and variational lower bound. Typically, as C gets lager, CMMVA learns more discriminative features and leads to a worse estimation of data likelihood. However, if C is too small, the supervision is not enough to lead to predictive features. Nevertheless, C = 103 is quite a good trade-off... We set C = 104 for our CMMVA model on SVHN by default. We use Ada M [10] to optimize parameters in all of the models. Although it is an adaptive gradient-based optimization method, we decay the global learning rate by factor three periodically after sufficient number of epochs to ensure a stable convergence.